mhalushka / miRge

miRge - microRNA alignment software for small RNA-seq data, now at v2.0
GNU General Public License v3.0
27 stars 14 forks source link

/human/annotation.Libs/human_trna.str does not exsit. Please check it. #13

Open ahsen1402 opened 6 years ago

ahsen1402 commented 6 years ago

Hi, I downloaded the human library from

wget -O human.tar.gz https://jh.box.com/shared/static/rj7ufy5v15uw7ytsyyrsryw99u7ml82j.gz

command but when running mirge I got the error "/human/annotation.Libs/human_trna.str does not exsit. Please check it.", so I guess the database needs to be updated. Is there any other place i can download it for now (or any other possible missing files).

Thanks in advance.

mhalushka commented 6 years ago

Thank you for writing. That needs to be updated, but the person who was making the fix has not completed this. In the mean time, if you send me an email (mhalush1@jhmi.edu), I can get you the files you need that were left out of the last .gz file.

mhalushka commented 6 years ago

missingtrffiles.zip Actually, I think this includes all of the missing files.

ahsen1402 commented 6 years ago

Hi Mark,

Thanks for sharing this, I will try and let you know if i have any further issues. One more quick question about the file unmapped.csv, are those reads that is aligned to the human genome but have no known annotation? Or are those just rest of the reads that do not appear in the mapped.csv file.

Thanks

mhalushka commented 6 years ago

The reads in the unmapped.csv file are all the reads that are not appearing in the mapped.csv file. They have not been aligned to the human genome. I occasionally blast abundant reads from that file and find some align to repeat elements in the human genome. Many do not align to anything. I hope that is helpful.

ahsen1402 commented 6 years ago

Hi Mark,

With the data you have given I was able to continue running but this time run another issue: A snapshot of error, any idea what might be wrong?

Process Worker-2:
Traceback (most recent call last):
  File "/hpc/users/soft/enter/envs/mirge/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/hpc/users/soft/enter/envs/mirge/lib/python2.7/site-packages/mirge/utils/trim_file.py", line 54, in run
    read = modifier(read)
TypeError: __call__() takes exactly 3 arguments (2 given)
Process Worker-3:
Traceback (most recent call last):
  File "/hpc/users/soft/enter/envs/mirge/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/hpc/users/soft/enter/envs/mirge/lib/python2.7/site-packages/mirge/utils/trim_file.py", line 54, in run
    read = modifier(read)
TypeError: __call__() takes exactly 3 arguments (2 given)
Process Worker-4:
mhalushka commented 6 years ago

Yes - someone else had the same error and we think the problem is the python packages. We specifically think the error is if you are using cutadapt v1.18. You need to use cutadapt v1.11. If that doesn't work please make sure all of your python packages exactly match these: cutadapt(v1.11), biopython(v1.68), numpy(v1.11.3), scipy(v0.17.0), matplotlib(v2.1.1), pandas(v0.21.0), sklearn(v0.18.1), reportlab(v3.3.0) and forgi(v0.20). We think there is a forward incompatibility problem that we need to solve. Thank you for letting me know and let me know if this solves the problem.

ahsen1402 commented 6 years ago

Hi Mark,

Will try this one quick remark to make your job easier. I am already feeding fastq files that I trimmed priorly so I am not using any adapter option so do you think you still call cutadapt? I can confirm that all packages same I was able to run mirge in the same environment back in May. However, just recently I tried to update it using bioconda which I think started the problem.

mhalushka commented 6 years ago

That is interesting. I think miRge still calls cutadapt even if they are trimmed files as it still removes poor quality reads through that function of cutadapt. I'm sorry the update caused the problem and we'll try to figure out what we might have changed (besides leaving out some tRNA files). I know the last version of miRge added a tRF finder which was a significant change/addition to the program. It's possible that caused some incompatibilities that you are now seeing. If the packages fix doesn't work, please let me know.

mhalushka commented 6 years ago

Were you able to get it to run, or is it still failing?

ahsen1402 commented 6 years ago

Hi Mark, Sorry for my late reply unfortunately i got this error this time:

Performing annotation for all of the collasped sequences...
All annotation cycles completed (6837.66 sec).

Summarizing and tabulating results...
Traceback (most recent call last):
  File "/soft/enter/envs/mirge1/bin/miRge2.0", line 11, in <module>
    load_entry_point('mirge==2.0', 'console_scripts', 'miRge2.0')()
  File "/soft/enter/envs/mirge1/lib/python2.7/site-packages/mirge/__main__.py", line 389, in main
    writeDataToCSV(outputdir, annotNameList, sampleList, isomirDiff, a_to_i, logDic, seqDic, mirDic, mirNameSeqDic, mirMergedNameDic, bowtieBinary, genome_index, numCPU, phred64, removedMiRNA_ai_List, spikeIn, gff_output, isomiRContentDic, miRNA_database, trf_output, trfContentDic, trnaStruDic, pre_tRNA_index, duptRNA2UniqueDic, trnaAAanticodonDic, tRNAtrfDic, trfMergedNameDic, trfMergedList)
  File "/soft/enter/envs/mirge1/lib/python2.7/site-packages/mirge/utils/writeDataToCSV.py", line 1080, in writeDataToCSV
    mimatchState, mismatchPosition = detectMismach(contentTmp[0], tRNA_seq, contentTmp[2])
  File "/soft/enter/envs/mirge1/lib/python2.7/site-packages/mirge/utils/writeDataToCSV.py", line 541, in detectMismach
    if target_seq[i] != seqTmp[i]:
IndexError: string index out of range
mhalushka commented 6 years ago

I'll think about the problem a bit more tomorrow, but I see the annotation cycles were 6837 seconds which is a really long time. I suspect you aligned multiple fastq files at once, but I wonder if you hit some sort of max buffer in your RAM that caused the write function to fail. If you run only one .fastq file, do you get the same error?

ahsen1402 commented 6 years ago

Hi Mark,

With one sample it finished without errors:

Summarizing and tabulating results...
The number of A-to-I editing sites for is less than 10 so that no heatmap is drawn.
Summary Complete (150.93 sec)
Annotation of miRge2.0 Completed (412.83 sec)

Is the algorithm deterministic upto bowtie assignments that I can run my data in batchs or does it use information from other samples while analyzing a given sample.

mhalushka commented 6 years ago

I'm glad it partially worked. If you are just annotating, you could run them all one at a time or in smaller batches and it won't have any negative effects. If you are trying to identify novel miRNAs, you may end up with a more repetitive experience, but still get the correct data. I frequently run up to 10 samples together without any issues. I've done more, but only with smaller fastq files (<2 million reads each).

chenlx2014 commented 4 years ago

Hi Mark, I can't download the human library from the URL. Can you give me last human library files?

arunhpatil commented 4 years ago

@chenlx2014 Can you provide me your email address, I will share the zip file for human libraries.

chenlx2014 commented 4 years ago

18246094519@163.com Thanks for your sharing.

chenlx2014 commented 4 years ago

If my fastq file is phred33, the default format is phred64. What should I do?

mhalushka commented 4 years ago

I think our help page is incorrect. The default is phred 33. Please run the file without calling -phred64 and it should be fine. Let me know if you have any problems with that.