yyoshiaki / VIRTUS2

A bioinformatics pipeline for viral transcriptome detection and quantification considering splicing.
Other
18 stars 7 forks source link

Wrapper problem/maybe fix? #26

Open KevinMaroney opened 1 year ago

KevinMaroney commented 1 year ago

Hello! Sorry for all of my many questions

I was able to get a count for a single one of my 16 fastq files using VIRTUS.PE.cwl, but had to use cwltool --singularity to run it, and if I attempted to use the .yaml file as a parameter file for example for downloading indices it seemed to be able to unable to read it so I had to just manually enter in flag locations.

However, when I attempt to use the wrapper as opposed to opening 16 tabs (it takes O/N / file because of the 5.5 billion read sequencing depth of my collaborators), I received an error as so:

(Virtus) [kmaroney@c0140 Virtus2]$ ~/programs/VIRTUS2/wrapper/VIRTUS_wrapper.py input.fastq.csv     --fastq     --VIRTUSDir ~/programs/VIRTUS2/     -s1 _R1_1.fastq.gz     -s2 _R2_2.fastq.gz     --genomeDir_human ~/programs/VIRTUS2/workflow/STAR_index_human     --genomeDir_virus ~/programs/VIRTUS2/workflow/STAR_index_virus     --nthreads=4
/home/kmaroney/programs/VIRTUS2/
cwltool --rm-tmpdir /home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl --fastq1 ../HS-13-25427_S26_R1_1.fastq.gz --fastq2 ../HS-13-25427_S26_R2_2.fastq.gz --genomeDir_human /home/kmaroney/programs/VIRTUS2/workflow/STAR_index_human --genomeDir_virus /home/kmaroney/programs/VIRTUS2/workflow/STAR_index_virus --outFileNamePrefix_human human --nthreads 4 --filename_output VIRTUS.output.txt

INFO /data/user/kmaroney/.conda/envs/Virtus/bin/cwltool 3.1.20230624081518
INFO Resolved '/home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl' to 'file:///home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl'
WARNING Workflow checker warning:
../../../../../../../../home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl:253:9: Source 'output' of
                                                                                     type ["null",
                                                                                     "File"] may be
                                                                                     incompatible
../../../../../../../../home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl:303:9:   with sink 'input'
                                                                                       of type "File"
../../../../../../../../home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl:215:9: Source 'output_fq2'
                                                                                     of type ["null",
                                                                                     "File"] may be
                                                                                     incompatible
../../../../../../../../home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl:263:9:   with sink
                                                                                       'input_fq' of type
                                                                                       "File"
../../../../../../../../home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl:178:9: Source
                                                                                     'mappingstats' of
                                                                                     type ["null",
                                                                                     "File"] may be
                                                                                     incompatible
../../../../../../../../home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl:313:9:   with sink
                                                                                       'input_STARLog' of
                                                                                       type "File"
../../../../../../../../home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl:41:5:  Source
                                                                                     'filename_output'
                                                                                     of type ["null",
                                                                                     "string"] may be
                                                                                     incompatible
../../../../../../../../home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl:319:9:   with sink
                                                                                       'filename_output'
                                                                                       of type "string"
../../../../../../../../home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl:283:9: Source 'output' of
                                                                                     type ["null",
                                                                                     "File"] may be
                                                                                     incompatible
../../../../../../../../home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl:290:9:   with sink 'fq1'
                                                                                       of type "File"
../../../../../../../../home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl:268:9: Source 'output' of
                                                                                     type ["null",
                                                                                     "File"] may be
                                                                                     incompatible
../../../../../../../../home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl:292:9:   with sink 'fq2'
                                                                                       of type "File"
INFO [workflow ] start
INFO [workflow ] starting step fastp_pe
INFO [step fastp_pe] start
ERROR Unexpected exception
Traceback (most recent call last):
  File "/data/user/kmaroney/.conda/envs/Virtus/lib/python3.11/site-packages/cwltool/pathmapper.py", line 169, in visit
    st = os.lstat(deref)
         ^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/Virtus2/HS-13-25427_S26_R1_1.fastq.gz'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/user/kmaroney/.conda/envs/Virtus/lib/python3.11/site-packages/cwltool/workflow.py", line 459, in job
    yield from self.embedded_tool.job(
  File "/data/user/kmaroney/.conda/envs/Virtus/lib/python3.11/site-packages/cwltool/command_line_tool.py", line 990, in job
    builder.pathmapper = self.make_path_mapper(reffiles, builder.stagedir, runtimeContext, True)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/user/kmaroney/.conda/envs/Virtus/lib/python3.11/site-packages/cwltool/command_line_tool.py", line 485, in make_path_mapper
    return PathMapper(reffiles, runtimeContext.basedir, stagedir, separateDirs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/user/kmaroney/.conda/envs/Virtus/lib/python3.11/site-packages/cwltool/pathmapper.py", line 95, in __init__
    self.setup(dedup(referenced_files), basedir)
  File "/data/user/kmaroney/.conda/envs/Virtus/lib/python3.11/site-packages/cwltool/pathmapper.py", line 198, in setup
    self.visit(
  File "/data/user/kmaroney/.conda/envs/Virtus/lib/python3.11/site-packages/cwltool/pathmapper.py", line 158, in visit
    with SourceLine(
  File "schema_salad/sourceline.py", line 250, in __exit__
schema_salad.exceptions.ValidationException: [Errno 2] No such file or directory: '/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/Virtus2/HS-13-25427_S26_R1_1.fastq.gz'
ERROR [step fastp_pe] Cannot make job: [Errno 2] No such file or directory: '/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/Virtus2/HS-13-25427_S26_R1_1.fastq.gz'

I do not understand every flag, but the "red" item was a little confusing:

ERROR [step fastp_pe] Cannot make job: [Errno 2] No such file or directory: '/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/Virtus2/HS-13-25427_S26_R1_1.fastq.gz'

This is because in input.fastq.csv, I formatted it to include the full file path of each fastq file (and changed the suffix as you suggested in tutorial to match those of each individual filename):

input.fastq.csv: Name,fastq,Layout,Group R1,/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/data/sequences/raw_sequences/HS-13-25427_S26,PE,Recurrent R2,/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/data/sequences/raw_sequences/S-11-21696_S35,PE,Recurrent R3,/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/data/sequences/raw_sequences/S-16-8910_S34,PE,Recurrent NR1,/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/data/sequences/raw_sequences/HS-15-2469_S32,PE,Non-recurrent NR2,/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/data/sequences/raw_sequences/S-09-30780_S31,PE,Non-recurrent NR3,/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/data/sequences/raw_sequences/S-09-35192_S38,PE,Non-recurrent NR4,/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/data/sequences/raw_sequences/S-10-15138_S40,PE,Non-recurrent NR5,/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/data/sequences/raw_sequences/S-11-10564_S36,PE,Non-recurrent NR6,/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/data/sequences/raw_sequences/S-14-13428_S27,PE,Non-recurrent NR7,/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/data/sequences/raw_sequences/S-14-8265_S25,PE,Non-recurrent NR8,/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/data/sequences/raw_sequences/S-16-13674_S29,PE,Non-recurrent NR9,/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/data/sequences/raw_sequences/S-16-8910_S34,PE,Non-recurrent NR10,/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/data/sequences/raw_sequences/S-17-14189_S28,PE,Non-recurrent

code in project folder/Virtus2 analysis:

~/programs/VIRTUS2/wrapper/VIRTUS_wrapper.py input.fastq.csv --fastq --VIRTUSDir ~/programs/VIRTUS2/ -s1 _R1_1.fastq.gz -s2 _R2_2.fastq.gz --genomeDir_human ~/programs/VIRTUS2/workflow/STAR_index_human --genomeDir_virus ~/programs/VIRTUS2/workflow/STAR_index_virus --nthreads=4

I solved this first problem by making a symbolic link to every fastq file from the original directory to the projectname/Virtus2 directory.

It then gave me the error that Docker was not available, so I had to change VIRTUS_wrapper.py "cwltool" under PE condition to cwltool --singularity.

It seems to be running. This is probably very simple for you, but I just wanted to give feedback as I assume many people trying to use your well-documented tool may be also unable to use Docker if on a high performance computing cluster. I also am unsure if any of these warnings are important or you think they may be ok to ignore? Here's where it's up to after adjusting those parameters:

 ~/programs/VIRTUS2/wrapper/VIRTUS_wrapper2.py input.fastq.csv     --fastq     --VIRTUSDir ~/programs/VIRTUS2/workflow/     -s1 _R1_1.fastq.gz     -s2 _R2_2.fastq.gz     --genomeDir_human ~/programs/VIRTUS2/workflow/STAR_index_human     --genomeDir_virus ~/programs/VIRTUS2/workflow/STAR_index_virus     --nthreads=4
bash: /home/kmaroney/programs/VIRTUS2/wrapper/VIRTUS_wrapper2.py: Permission denied
(Virtus) [kmaroney@c0140 Virtus2]$ chmod -R 777 ~/programs/VIRTUS2/
(Virtus) [kmaroney@c0140 Virtus2]$ ~/programs/VIRTUS2/wrapper/VIRTUS_wrapper2.py input.fastq.csv     --fastq     --VIRTUSDir ~/programs/VIRTUS2/workflow/     -s1 _R1_1.fastq.gz     -s2 _R2_2.fastq.gz     --genomeDir_human ~/programs/VIRTUS2/workflow/STAR_index_human     --genomeDir_virus ~/programs/VIRTUS2/workflow/STAR_index_virus     --nthreads=4
/home/kmaroney/programs/VIRTUS2/workflow/
cwltool --singularity --rm-tmpdir /home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl --fastq1 ../HS-13-25427_S26_R1_1.fastq.gz --fastq2 ../HS-13-25427_S26_R2_2.fastq.gz --genomeDir_human /home/kmaroney/programs/VIRTUS2/workflow/STAR_index_human --genomeDir_virus /home/kmaroney/programs/VIRTUS2/workflow/STAR_index_virus --outFileNamePrefix_human human --nthreads 4 --filename_output VIRTUS.output.txt

INFO /data/user/kmaroney/.conda/envs/Virtus/bin/cwltool 3.1.20230624081518
INFO Resolved '/home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl' to 'file:///home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl'
WARNING Workflow checker warning:
../../../../../../../../home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl:178:9: Source
                                                                                     'mappingstats' of
                                                                                     type ["null",
                                                                                     "File"] may be
                                                                                     incompatible
../../../../../../../../home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl:313:9:   with sink
                                                                                       'input_STARLog' of
                                                                                       type "File"
../../../../../../../../home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl:41:5:  Source
                                                                                     'filename_output'
                                                                                     of type ["null",
                                                                                     "string"] may be
                                                                                     incompatible
../../../../../../../../home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl:319:9:   with sink
                                                                                       'filename_output'
                                                                                       of type "string"
../../../../../../../../home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl:283:9: Source 'output' of
                                                                                     type ["null",
                                                                                     "File"] may be
                                                                                     incompatible
../../../../../../../../home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl:290:9:   with sink 'fq1'
                                                                                       of type "File"
../../../../../../../../home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl:268:9: Source 'output' of
                                                                                     type ["null",
                                                                                     "File"] may be
                                                                                     incompatible
../../../../../../../../home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl:292:9:   with sink 'fq2'
                                                                                       of type "File"
../../../../../../../../home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl:253:9: Source 'output' of
                                                                                     type ["null",
                                                                                     "File"] may be
                                                                                     incompatible
../../../../../../../../home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl:303:9:   with sink 'input'
                                                                                       of type "File"
../../../../../../../../home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl:215:9: Source 'output_fq2'
                                                                                     of type ["null",
                                                                                     "File"] may be
                                                                                     incompatible
../../../../../../../../home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl:263:9:   with sink
                                                                                       'input_fq' of type
                                                                                       "File"
INFO [workflow ] start
INFO [workflow ] starting step fastp_pe
INFO [step fastp_pe] start
INFO ['singularity', 'pull', '--force', '--name', 'quay.io_biocontainers_fastp:0.20.0--hdbcaa40_0.sif', 'docker://quay.io/biocontainers/fastp:0.20.0--hdbcaa40_0']
WARN[0000] "/run/user/12031" directory set by $XDG_RUNTIME_DIR does not exist. Either create the directory or unset $XDG_RUNTIME_DIR.: stat /run/user/12031: no such file or directory: Trying to pull image in the event that it is a public image.
INFO [job fastp_pe] /scratch/local/6mm7ee7v$ singularity \
    --quiet \
    exec \
    --contain \
    --ipc \
    --cleanenv \
    --pid \
    --home \
    /local/6mm7ee7v:/xDYCyQ \
    --bind \
    /local/_9uxjpka:/tmp \
    --bind \
    /data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/Virtus2/../data/sequences/raw_sequences/HS-13-25427_S26_R1_1.fastq.gz:/var/lib/cwl/stg1172a923-ecfa-4ba3-adbb-66c88cb2d534/HS-13-25427_S26_R1_1.fastq.gz:ro \
    --bind \
    /data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/Virtus2/../data/sequences/raw_sequences/HS-13-25427_S26_R2_2.fastq.gz:/var/lib/cwl/stgc3f96679-08a5-4d0b-90d4-282646b5a91c/HS-13-25427_S26_R2_2.fastq.gz:ro \
    --pwd \
    /xDYCyQ \
    /data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/Virtus2/R1/quay.io_biocontainers_fastp:0.20.0--hdbcaa40_0.sif \
    /bin/sh \
    -c \
    fastp -o HS-13-25427_S26_R1_1.fastp.fastq -O HS-13-25427_S26_R2_2.fastp.fastq  --trim_poly_x  -i /var/lib/cwl/stg1172a923-ecfa-4ba3-adbb-66c88cb2d534/HS-13-25427_S26_R1_1.fastq.gz -I /var/lib/cwl/stgc3f96679-08a5-4d0b-90d4-282646b5a91c/HS-13-25427_S26_R2_2.fastq.gz --length_required 40 --thread 4
WARNING: Skipping mount /data/rc/apps/rc/software/Singularity/3.5.2-GCC-5.4.0-2.26/var/singularity/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
KevinMaroney commented 1 year ago

I also ran into this error: cwltool --singularity --rm-tmpdir /home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl --fastq1 ../S-17-14189_S28_R1_1.fastq.gz --fastq2 ../S-17-14189_S28_R2_2.fastq.gz --genomeDir_human /home/kmaroney/programs/VIRTUS2/workflow/STAR_index_human --genomeDir_virus /home/kmaroney/programs/VIRTUS2/workflow/STAR_index_virus --outFileNamePrefix_human human --nthreads 40 --filename_output VIRTUS.output.txt

Traceback (most recent call last): File "/home/kmaroney/programs/VIRTUS2/wrapper/VIRTUS_wrapper2.py", line 198, in df_cov = df_cov.loc[list_index]


  File "/data/user/kmaroney/.conda/envs/Virtus/lib/python3.11/site-packages/pandas/core/indexing.py", line 1091, in __getitem__
    check_dict_or_set_indexers(key)
  File "/data/user/kmaroney/.conda/envs/Virtus/lib/python3.11/site-packages/pandas/core/indexing.py", line 2618, in check_dict_or_set_indexers
    raise TypeError(
TypeError: Passing a set as an indexer is not supported. Use a list instead.

It seems that by just changing all instances of [list_index] to [list(list_index)], that fixed the problem. I'm not very familiar with Python, but maybe a package changed definitions or something? Sorry and thank you.
yyoshiaki commented 1 year ago

Sorry for my late reply. I added the --singularity option to VIRTUS_wrapper.py in v2.0.2 (developmental version). Regarding the path problem, I couldn't reproduce that.

TypeError: Passing a set as an indexer is not supported. Use a list instead.

Thank you, you are right. That happened due to the pandas update. I already fixed the issue in v2.0.2.

StevenWijnen commented 4 months ago

Ran into the same error that my specified files were not found by the virtus wrapper. Fixed it by adding this line dir_name = os.path.dirname(item["SRR"]) on line 70 in VIRTUS_wrapper.py and changing line 132 and 133 to

"--fastq1", os.path.join(dir_name, fastq1), "--fastq2", os.path.join(dir_name, fastq2),

This fix is however only for the PE, if you want to implement it for SE as well you need to do the same on line 146 within the elif statement elif item["Layout"] =="SE".

The specified directories in the input.fastq.csv file were not copied correctly by the wrapper script. The fix of @SomeGuy3865 also worked but I found this easier than softlinking an entire directory of fastq files.

Hopefully this helps you to implement the fix.

And most of all thanks a lot for the very intuitive code and documentation. Really nice package/tool!

antoine4ucsd commented 4 months ago

hello I agree. really nice pipeline! still struggling with the wrapper for paired fastq. can you clarify how to use it the current template is

Name,fastq,Layout,Group
Flu_1,SRR9856912,PE,H3N2
Flu_2,SRR9856913,PE,H3N2
Ctrl_1,SRR9856914,PE,Mock
Ctrl_2,SRR9856915,PE,Mock

but the wrapper does not detect the fastq. if my fastq are named as follow ./data/sample_R1.fasq.gz and ./data/sample_R2.fasq.gz

I tried various settings such as

Name,fastq,Layout,Group
Flu_1,./data/sample_R1,PE,H3N2

I get this error

./VIRTUS_wrapper.py:76: SyntaxWarning: invalid escape sequence '\.'
  pattern_1 = "^" + sample_index + "_1((\.fq\.gz)|(\.fq)|(\.fastq)|(\.fastq\.gz))$"
~/_virtus/VIRTUS2/wrapper/./VIRTUS_wrapper.py:86: SyntaxWarning: invalid escape sequence '\.'
  pattern_2 = "^" + sample_index + "_2((\.fq\.gz)|(\.fq)|(\.fastq)|(\.fastq\.gz))$"
~/_virtus/VIRTUS2/wrapper/./VIRTUS_wrapper.py:97: SyntaxWarning: invalid escape sequence '\.'
  pattern = "^" + sample_index + "((\.fq\.gz)|(\.fq)|(\.fastq)|(\.fastq\.gz))$"
/Users/antoinechaillon/Dropbox/_virtus/VIRTUS2/
fastq_1 not found
fastq_2 not found
Traceback (most recent call last):
  File "~/_virtus/VIRTUS2/wrapper/./VIRTUS_wrapper.py", line 130, in <module>
    "--fastq1", '../'+fastq1,
                      ^^^^^^
NameError: name 'fastq1' is not defined

thoughts? thank you

yyoshiaki commented 4 months ago

Hi, for your case, please use -s1 _R1.fastq.gz -s2 _R2.fastq.gz

and

Name,fastq,Layout,Group
Flu_1,./data/sample,PE,H3N2

Thank you!