pdimens / harpy

Process raw haplotagging data, from raw sequences to phased haplotypes, batteries included.
https://pdimens.github.io/harpy
GNU General Public License v3.0
11 stars 1 forks source link

harpy impute error IsADirectoryError: #68

Open gmkov opened 5 months ago

gmkov commented 5 months ago

Describe the bug

Thank you for this great resource. I realise it still under development, so unsure about whether to expect modules to run smoothly.

After some initial teething issues with sample names in my own vcf file created with bcftools mpileup outside harpy, I ran into a cryptic conda issue. So I went a step back, and used the harpy snp mpileup module to obtain a bcf, which finished with an error (i think just unable to compile the bcf html report) but the files looked ok within SNP/mpileup/. So then I tried imputation with harpy impute and using this file as input.

my command is:

harpy impute --threads 10 --parameters stitch.params \
--vcf SNP/mpileup/variants.raw.bcf \
bams 2> log.impute

my stitch param file is:

model   usebx   bxlimit k       s       ngen
diploid TRUE    50000   30      1       500
diploid FALSE   50000   30      1       500
diploid TRUE    50000   40      1       500
diploid FALSE   50000   40      1       500

This produces the same cryptic conda error as before, which I paste below.

I also ran the harpy preflight bam bams and whole snakemake build a DAG correctly and seems to create/activate conda environments fine, and runs correctly (after I manually installed install.packages("flexdashboard"), this was the error inititally). So the preflight checks with the bam files were ok.

Harpy Version

0.9.1

File that triggers the error (if applicable)

No response

Harpy error log

(harpy) [mgm49@login-n-1 harpy]$ harpy impute --threads 10 --parameters stitch.params \
--vcf SNP/mpileup/variants.raw.bcf \
bams 2> log.impute

(harpy) [mgm49@login-n-1 harpy]$ less log.impute 
Traceback (most recent call last):
  File "/home/mgm49/miniconda3/envs/harpy/bin/harpy", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/mgm49/miniconda3/envs/harpy/lib/python3.12/site-packages/harpy/__main__.py", line 383, in main
    cli()
  File "/home/mgm49/miniconda3/envs/harpy/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgm49/miniconda3/envs/harpy/lib/python3.12/site-packages/rich_click/rich_command.py", line 126, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/mgm49/miniconda3/envs/harpy/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgm49/miniconda3/envs/harpy/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgm49/miniconda3/envs/harpy/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgm49/miniconda3/envs/harpy/lib/python3.12/site-packages/harpy/impute.py", line 66, in impute
    fetch_file(f"{i}.Rmd", f"{workflowdir}/report/")
  File "/home/mgm49/miniconda3/envs/harpy/lib/python3.12/site-packages/harpy/helperfunctions.py", line 124, in fetch_file
    shutil.copy2(result, destination)
  File "/home/mgm49/miniconda3/envs/harpy/lib/python3.12/shutil.py", line 475, in copy2
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "/home/mgm49/miniconda3/envs/harpy/lib/python3.12/shutil.py", line 260, in copyfile
    with open(src, 'rb') as fsrc:
         ^^^^^^^^^^^^^^^
IsADirectoryError: [Errno 21] Is a directory: '/rds/project/cj107/rds-cj107-jiggins-rds/projects/mgm49/helicoverpa/09.snakemake.batch1.haplo.parse.barcodes/harpy/Impute'

Before submitting

gmkov commented 4 months ago

let me know if there is anything i can do to help. is there any way we could go back to the harpy dev version whereby if i installed flexdashboard manually it would generate the report? https://github.com/pdimens/harpy/issues/68#issuecomment-2090658645 - at some point yesterday (and before) this worked and they are fantastic

the good news is that harpy impute is now running when --skipreports is used, which was the topic of this Issue. I havent tested harpy impute with the harpdev version that did compile reports if i installed fleshdashboard manually, that might work. if not ill try to compile them locally. if i have issues with other modules ill open new issues.

do you plan on updating bioconda harpy with harpydev? thanks

pdimens commented 4 months ago

Thanks for sticking this out. Harpy dev is going to be the next release but it's not ready yet. Since I've already begun it, I'm going to continue working on Singularity env management and try to incorporate that into the next release. It may take a week or few bc I've fallen ill and probably won't work until my health improves.

pdimens commented 4 months ago

@gmkov if you're feeling dangerous, you can try the singularity branch. It's still undergoing testing, but it's being set up to use conda envs inside of a preconfigured docker/singularity container. Ideally, using harpy will pull the container once (per project directory) and the rules are executed through there when necessary. For the development version, you will need to manually install apptainer for it to work:

conda/mamba install -c conda-forge apptainer
pdimens commented 4 months ago

@gmkov took a bit to augment the test suite to accommodate the container situation, but they are setup now and all modules except phase should work as expected on the singularity branch. I'll try to tackle the phase issue in the coming days

pdimens commented 4 months ago

@gmkov all tests have passed, the singularity branch has been merged into dev and the container approach should now guarantee that the software exists for the rules

gmkov commented 4 months ago

good evening! i was NOT feeling dangerous or brave this morning, sorry. but this sounds great! so phase should work on the singularity branch? will try tomorrow, and will try preflight bam with reports :) thanks

pdimens commented 4 months ago

So long as you add apptainer to the harpydev conda environment, you should be able to pull the dev branch like before and try it. (The singularity branch was merged into dev). Otherwise, yes, all modules should have functionality. "Should"

gmkov commented 4 months ago

testing harpy dev singularity. created a new environment so that i can carry on using the old dev

### install harpydev singularity
# get the git repo
git clone https://github.com/pdimens/harpy.git
cd harpy

# switch to dev branch
git checkout dev

# create conda/mamba env - im using conda, it's harpy.yaml (no env), and activate harpydev (not harpy)
conda env create --name harpydevsing --file resources/harpy.yaml

# it NO LONGER asks me to run conda init before conda activate- GOOD (conda behaving correctly now)

# now can activate env
conda activate harpydevsing

# install everything - Successfully installed harpy-0.10.0
bash resources/buildlocal.sh

# install apptainer manually. success
conda install -c conda-forge apptainer

now to test preflight bams

# test with 6 samples
#### preflight checks
rm -R Preflight
harpy preflight bam --threads 20 --snakemake "--conda-frontend conda" bams/ &

unfortunately, hit an error:

Activating conda environment: ../../../../../../../../../../conda-envs/6e3a7dc7152f45301056861193dd3db5
INFO:    gocryptfs not found, will not be able to use gocryptfs
INFO:    gocryptfs not found, will not be able to use gocryptfs
[E::hts_open_format] Failed to open file "Preflight/bam/workflow/input/CAM046072.bam" : No such file or directory
Traceback (most recent call last):
  File "/rds/project/cj107/rds-cj107-jiggins-rds/projects/mgm49/helicoverpa/09.snakemake.batch1.haplo.parse.barcodes/harpy-dev/test6/.snakemake/scripts/tmphf2y_37n.checkBAM.py", line 18, in <module>
    alnfile = pysam.AlignmentFile(bam_in)
  File "pysam/libcalignmentfile.pyx", line 751, in pysam.libcalignmentfile.AlignmentFile.__cinit__
  File "pysam/libcalignmentfile.pyx", line 950, in pysam.libcalignmentfile.AlignmentFile._open
FileNotFoundError: [Errno 2] could not open alignment file `Preflight/bam/workflow/input/CAM046072.bam`: No such file or directory
[E::hts_open_format] Failed to open file "Preflight/bam/workflow/input/CAM046075.bam" : No such file or directory
Traceback (most recent call last):
  File "/rds/project/cj107/rds-cj107-jiggins-rds/projects/mgm49/helicoverpa/09.snakemake.batch1.haplo.parse.barcodes/harpy-dev/test6/.snakemake/scripts/tmppm_q3nih.checkBAM.py", line 18, in <module>
    alnfile = pysam.AlignmentFile(bam_in)

....

RuleException:
CalledProcessError in file /rds/project/cj107/rds-cj107-jiggins-rds/projects/mgm49/helicoverpa/09.snakemake.batch1.haplo.parse.barcodes/harpy-dev/test6/Preflight/bam/workflow/preflight-bam.smk, line 67:
Command ' singularity  exec --home '/rds/project/cj107/rds-cj107-jiggins-rds/projects/mgm49/helicoverpa/09.snakemake.batch1.haplo.parse.barcodes/harpy-dev/test6'  --bind '/home/mgm49/rds/hpc-work/home/miniconda3/envs/harpydevsing/lib/python3.12/site-packages':'/mnt/snakemake_searchpaths/item_0'  --bind '/rds/project/cj107/rds-cj107-jiggins-rds/projects/mgm49/helicoverpa/09.snakemake.batch1.haplo.parse.barcodes/harpy-dev/test6/Preflight/bam/workflow':'/mnt/snakemake_searchpaths/item_1'  --bind '/rds/user/mgm49/hpc-work/home/miniconda3/envs/harpydevsing/bin':'/mnt/snakemake_searchpaths/item_2'  --bind '/home/mgm49/rds/hpc-work/home/miniconda3/envs/harpydevsing/lib/python3.12':'/mnt/snakemake_searchpaths/item_3'  --bind '/home/mgm49/rds/hpc-work/home/miniconda3/envs/harpydevsing/lib/python3.12/lib-dynload':'/mnt/snakemake_searchpaths/item_4'  --bind '/home/mgm49/rds/hpc-work/home/miniconda3/envs/harpydevsing/lib/python3.12/site-packages':'/mnt/snakemake_searchpaths/item_5' /rds/project/cj107/rds-cj107-jiggins-rds/projects/mgm49/helicoverpa/09.snakemake.batch1.haplo.parse.barcodes/harpy-dev/test6/.snakemake/singularity/3f319597cd9434b974447523f685fc31.simg bash -c 'source /opt/conda/bin/activate '\''/conda-envs/6e3a7dc7152f45301056861193dd3db5'\''; set -euo pipefail;  python /rds/project/cj107/rds-cj107-jiggins-rds/projects/mgm49/helicoverpa/09.snakemake.batch1.haplo.parse.barcodes/harpy-dev/test6/.snakemake/scripts/tmphf2y_37n.checkBAM.py'' returned non-zero exit status 1.

it says "Preflight/bam/workflow/input/CAM046072.bam" doesnt exist but it definitely does! and this same set of files was working before with the normal dev version

thanks

pdimens commented 4 months ago

You don't need to add a conda frontend as the container approach doesn't use your system's conda. Try it without the --snakemake option

gmkov commented 4 months ago

tried with harpy preflight bam --threads 20 bams/ & but same issues

INFO:    gocryptfs not found, will not be able to use gocryptfs
[E::hts_open_format] Failed to open file "Preflight/bam/workflow/input/CAM046072.bam" : No such file or directory
Traceback (most recent call last):
  File "/rds/project/cj107/rds-cj107-jiggins-rds/projects/mgm49/helicoverpa/09.snakemake.batch1.haplo.parse.barcodes/harpy-dev/test6/.snakemake/scripts/tmpow__6yif.checkBAM.py", line 18, in <module>
[E::hts_open_format] Failed to open file "Preflight/bam/workflow/input/CAM046070.bam" : No such file or directory
pdimens commented 4 months ago

What the actual heck is going on here; your system never ceases to impress me.

I'm kind of out of ideas. The only thing I can think of is whether your bams folder is made up of symlinks? Maybe symlinks nested too deep don't jive with container things? I'm very new to containers and still learning about quirks. Also, are these alignments produced by Harpy or elsewhere?

In the event you're using symlinks, I've made an adjustment to dev that resolves absolute paths when making symlinks into workdir/input so that even if symlinks are provided as input, the symlinks in workflow/input (across all modules) will resolve to the absolute path of the original file, whether it was the file itself or a symlink to another file.

pdimens commented 3 months ago

@gmkov FWIW, if you're still having issues, there's a new release of harpy that incorporates the conainterization of things (or not, you can toggle it with --conda) and might fix the issue(s) you were having.

pdimens commented 1 month ago

@gmkov there's been a lot of internal work and several releases since we last tried to troubleshoot this. I welcome continuing this discussion and troubleshooting if/when you have the bandwidth for it :)