Open KevinMaroney opened 1 year ago
I also ran into this error: cwltool --singularity --rm-tmpdir /home/kmaroney/programs/VIRTUS2/workflow/VIRTUS.PE.cwl --fastq1 ../S-17-14189_S28_R1_1.fastq.gz --fastq2 ../S-17-14189_S28_R2_2.fastq.gz --genomeDir_human /home/kmaroney/programs/VIRTUS2/workflow/STAR_index_human --genomeDir_virus /home/kmaroney/programs/VIRTUS2/workflow/STAR_index_virus --outFileNamePrefix_human human --nthreads 40 --filename_output VIRTUS.output.txt
Traceback (most recent call last):
File "/home/kmaroney/programs/VIRTUS2/wrapper/VIRTUS_wrapper2.py", line 198, in
File "/data/user/kmaroney/.conda/envs/Virtus/lib/python3.11/site-packages/pandas/core/indexing.py", line 1091, in __getitem__
check_dict_or_set_indexers(key)
File "/data/user/kmaroney/.conda/envs/Virtus/lib/python3.11/site-packages/pandas/core/indexing.py", line 2618, in check_dict_or_set_indexers
raise TypeError(
TypeError: Passing a set as an indexer is not supported. Use a list instead.
It seems that by just changing all instances of [list_index] to [list(list_index)], that fixed the problem. I'm not very familiar with Python, but maybe a package changed definitions or something? Sorry and thank you.
Sorry for my late reply. I added the --singularity option to VIRTUS_wrapper.py in v2.0.2 (developmental version). Regarding the path problem, I couldn't reproduce that.
TypeError: Passing a set as an indexer is not supported. Use a list instead.
Thank you, you are right. That happened due to the pandas update. I already fixed the issue in v2.0.2.
Ran into the same error that my specified files were not found by the virtus wrapper. Fixed it by adding this line dir_name = os.path.dirname(item["SRR"]) on line 70 in VIRTUS_wrapper.py and changing line 132 and 133 to
"--fastq1", os.path.join(dir_name, fastq1), "--fastq2", os.path.join(dir_name, fastq2),
This fix is however only for the PE, if you want to implement it for SE as well you need to do the same on line 146 within the elif statement elif item["Layout"] =="SE".
The specified directories in the input.fastq.csv file were not copied correctly by the wrapper script. The fix of @SomeGuy3865 also worked but I found this easier than softlinking an entire directory of fastq files.
Hopefully this helps you to implement the fix.
And most of all thanks a lot for the very intuitive code and documentation. Really nice package/tool!
hello I agree. really nice pipeline! still struggling with the wrapper for paired fastq. can you clarify how to use it the current template is
Name,fastq,Layout,Group
Flu_1,SRR9856912,PE,H3N2
Flu_2,SRR9856913,PE,H3N2
Ctrl_1,SRR9856914,PE,Mock
Ctrl_2,SRR9856915,PE,Mock
but the wrapper does not detect the fastq. if my fastq are named as follow ./data/sample_R1.fasq.gz and ./data/sample_R2.fasq.gz
I tried various settings such as
Name,fastq,Layout,Group
Flu_1,./data/sample_R1,PE,H3N2
I get this error
./VIRTUS_wrapper.py:76: SyntaxWarning: invalid escape sequence '\.'
pattern_1 = "^" + sample_index + "_1((\.fq\.gz)|(\.fq)|(\.fastq)|(\.fastq\.gz))$"
~/_virtus/VIRTUS2/wrapper/./VIRTUS_wrapper.py:86: SyntaxWarning: invalid escape sequence '\.'
pattern_2 = "^" + sample_index + "_2((\.fq\.gz)|(\.fq)|(\.fastq)|(\.fastq\.gz))$"
~/_virtus/VIRTUS2/wrapper/./VIRTUS_wrapper.py:97: SyntaxWarning: invalid escape sequence '\.'
pattern = "^" + sample_index + "((\.fq\.gz)|(\.fq)|(\.fastq)|(\.fastq\.gz))$"
/Users/antoinechaillon/Dropbox/_virtus/VIRTUS2/
fastq_1 not found
fastq_2 not found
Traceback (most recent call last):
File "~/_virtus/VIRTUS2/wrapper/./VIRTUS_wrapper.py", line 130, in <module>
"--fastq1", '../'+fastq1,
^^^^^^
NameError: name 'fastq1' is not defined
thoughts? thank you
Hi, for your case, please use -s1 _R1.fastq.gz
-s2 _R2.fastq.gz
and
Name,fastq,Layout,Group
Flu_1,./data/sample,PE,H3N2
Thank you!
Hello! Sorry for all of my many questions
I was able to get a count for a single one of my 16 fastq files using VIRTUS.PE.cwl, but had to use cwltool --singularity to run it, and if I attempted to use the .yaml file as a parameter file for example for downloading indices it seemed to be able to unable to read it so I had to just manually enter in flag locations.
However, when I attempt to use the wrapper as opposed to opening 16 tabs (it takes O/N / file because of the 5.5 billion read sequencing depth of my collaborators), I received an error as so:
I do not understand every flag, but the "red" item was a little confusing:
This is because in input.fastq.csv, I formatted it to include the full file path of each fastq file (and changed the suffix as you suggested in tutorial to match those of each individual filename):
input.fastq.csv: Name,fastq,Layout,Group R1,/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/data/sequences/raw_sequences/HS-13-25427_S26,PE,Recurrent R2,/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/data/sequences/raw_sequences/S-11-21696_S35,PE,Recurrent R3,/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/data/sequences/raw_sequences/S-16-8910_S34,PE,Recurrent NR1,/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/data/sequences/raw_sequences/HS-15-2469_S32,PE,Non-recurrent NR2,/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/data/sequences/raw_sequences/S-09-30780_S31,PE,Non-recurrent NR3,/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/data/sequences/raw_sequences/S-09-35192_S38,PE,Non-recurrent NR4,/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/data/sequences/raw_sequences/S-10-15138_S40,PE,Non-recurrent NR5,/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/data/sequences/raw_sequences/S-11-10564_S36,PE,Non-recurrent NR6,/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/data/sequences/raw_sequences/S-14-13428_S27,PE,Non-recurrent NR7,/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/data/sequences/raw_sequences/S-14-8265_S25,PE,Non-recurrent NR8,/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/data/sequences/raw_sequences/S-16-13674_S29,PE,Non-recurrent NR9,/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/data/sequences/raw_sequences/S-16-8910_S34,PE,Non-recurrent NR10,/data/user/kmaroney/Projects/Shreshtha_Lab/Anal_Cancer_1/data/sequences/raw_sequences/S-17-14189_S28,PE,Non-recurrent
code in project folder/Virtus2 analysis:
~/programs/VIRTUS2/wrapper/VIRTUS_wrapper.py input.fastq.csv --fastq --VIRTUSDir ~/programs/VIRTUS2/ -s1 _R1_1.fastq.gz -s2 _R2_2.fastq.gz --genomeDir_human ~/programs/VIRTUS2/workflow/STAR_index_human --genomeDir_virus ~/programs/VIRTUS2/workflow/STAR_index_virus --nthreads=4
I solved this first problem by making a symbolic link to every fastq file from the original directory to the projectname/Virtus2 directory.
It then gave me the error that Docker was not available, so I had to change VIRTUS_wrapper.py "cwltool" under PE condition to cwltool --singularity.
It seems to be running. This is probably very simple for you, but I just wanted to give feedback as I assume many people trying to use your well-documented tool may be also unable to use Docker if on a high performance computing cluster. I also am unsure if any of these warnings are important or you think they may be ok to ignore? Here's where it's up to after adjusting those parameters: