Open ahsen1402 opened 6 years ago
Thank you for writing. That needs to be updated, but the person who was making the fix has not completed this. In the mean time, if you send me an email (mhalush1@jhmi.edu), I can get you the files you need that were left out of the last .gz file.
missingtrffiles.zip Actually, I think this includes all of the missing files.
Hi Mark,
Thanks for sharing this, I will try and let you know if i have any further issues. One more quick question about the file unmapped.csv, are those reads that is aligned to the human genome but have no known annotation? Or are those just rest of the reads that do not appear in the mapped.csv file.
Thanks
The reads in the unmapped.csv file are all the reads that are not appearing in the mapped.csv file. They have not been aligned to the human genome. I occasionally blast abundant reads from that file and find some align to repeat elements in the human genome. Many do not align to anything. I hope that is helpful.
Hi Mark,
With the data you have given I was able to continue running but this time run another issue: A snapshot of error, any idea what might be wrong?
Process Worker-2:
Traceback (most recent call last):
File "/hpc/users/soft/enter/envs/mirge/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
self.run()
File "/hpc/users/soft/enter/envs/mirge/lib/python2.7/site-packages/mirge/utils/trim_file.py", line 54, in run
read = modifier(read)
TypeError: __call__() takes exactly 3 arguments (2 given)
Process Worker-3:
Traceback (most recent call last):
File "/hpc/users/soft/enter/envs/mirge/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
self.run()
File "/hpc/users/soft/enter/envs/mirge/lib/python2.7/site-packages/mirge/utils/trim_file.py", line 54, in run
read = modifier(read)
TypeError: __call__() takes exactly 3 arguments (2 given)
Process Worker-4:
Yes - someone else had the same error and we think the problem is the python packages. We specifically think the error is if you are using cutadapt v1.18. You need to use cutadapt v1.11. If that doesn't work please make sure all of your python packages exactly match these: cutadapt(v1.11), biopython(v1.68), numpy(v1.11.3), scipy(v0.17.0), matplotlib(v2.1.1), pandas(v0.21.0), sklearn(v0.18.1), reportlab(v3.3.0) and forgi(v0.20). We think there is a forward incompatibility problem that we need to solve. Thank you for letting me know and let me know if this solves the problem.
Hi Mark,
Will try this one quick remark to make your job easier. I am already feeding fastq files that I trimmed priorly so I am not using any adapter option so do you think you still call cutadapt? I can confirm that all packages same I was able to run mirge in the same environment back in May. However, just recently I tried to update it using bioconda which I think started the problem.
That is interesting. I think miRge still calls cutadapt even if they are trimmed files as it still removes poor quality reads through that function of cutadapt. I'm sorry the update caused the problem and we'll try to figure out what we might have changed (besides leaving out some tRNA files). I know the last version of miRge added a tRF finder which was a significant change/addition to the program. It's possible that caused some incompatibilities that you are now seeing. If the packages fix doesn't work, please let me know.
Were you able to get it to run, or is it still failing?
Hi Mark, Sorry for my late reply unfortunately i got this error this time:
Performing annotation for all of the collasped sequences...
All annotation cycles completed (6837.66 sec).
Summarizing and tabulating results...
Traceback (most recent call last):
File "/soft/enter/envs/mirge1/bin/miRge2.0", line 11, in <module>
load_entry_point('mirge==2.0', 'console_scripts', 'miRge2.0')()
File "/soft/enter/envs/mirge1/lib/python2.7/site-packages/mirge/__main__.py", line 389, in main
writeDataToCSV(outputdir, annotNameList, sampleList, isomirDiff, a_to_i, logDic, seqDic, mirDic, mirNameSeqDic, mirMergedNameDic, bowtieBinary, genome_index, numCPU, phred64, removedMiRNA_ai_List, spikeIn, gff_output, isomiRContentDic, miRNA_database, trf_output, trfContentDic, trnaStruDic, pre_tRNA_index, duptRNA2UniqueDic, trnaAAanticodonDic, tRNAtrfDic, trfMergedNameDic, trfMergedList)
File "/soft/enter/envs/mirge1/lib/python2.7/site-packages/mirge/utils/writeDataToCSV.py", line 1080, in writeDataToCSV
mimatchState, mismatchPosition = detectMismach(contentTmp[0], tRNA_seq, contentTmp[2])
File "/soft/enter/envs/mirge1/lib/python2.7/site-packages/mirge/utils/writeDataToCSV.py", line 541, in detectMismach
if target_seq[i] != seqTmp[i]:
IndexError: string index out of range
I'll think about the problem a bit more tomorrow, but I see the annotation cycles were 6837 seconds which is a really long time. I suspect you aligned multiple fastq files at once, but I wonder if you hit some sort of max buffer in your RAM that caused the write function to fail. If you run only one .fastq file, do you get the same error?
Hi Mark,
With one sample it finished without errors:
Summarizing and tabulating results...
The number of A-to-I editing sites for is less than 10 so that no heatmap is drawn.
Summary Complete (150.93 sec)
Annotation of miRge2.0 Completed (412.83 sec)
Is the algorithm deterministic upto bowtie assignments that I can run my data in batchs or does it use information from other samples while analyzing a given sample.
I'm glad it partially worked. If you are just annotating, you could run them all one at a time or in smaller batches and it won't have any negative effects. If you are trying to identify novel miRNAs, you may end up with a more repetitive experience, but still get the correct data. I frequently run up to 10 samples together without any issues. I've done more, but only with smaller fastq files (<2 million reads each).
Hi Mark, I can't download the human library from the URL. Can you give me last human library files?
@chenlx2014 Can you provide me your email address, I will share the zip file for human libraries.
18246094519@163.com Thanks for your sharing.
If my fastq file is phred33, the default format is phred64. What should I do?
I think our help page is incorrect. The default is phred 33. Please run the file without calling -phred64 and it should be fine. Let me know if you have any problems with that.
Hi, I downloaded the human library from
command but when running mirge I got the error "/human/annotation.Libs/human_trna.str does not exsit. Please check it.", so I guess the database needs to be updated. Is there any other place i can download it for now (or any other possible missing files).
Thanks in advance.