molevol-ub / DOMINO

Development of molecular markers in non-model organisms
GNU General Public License v3.0
4 stars 3 forks source link

Assembly step issue(s) #12

Open civanovich-senck opened 4 years ago

civanovich-senck commented 4 years ago

Dear Developers,

Have encounter a couple of errors on the DM_assembly pipeline I havent been able to deal with: 1)MIRA: So seems to be that MIRA doesnt like to assemble high coverage reads (which is intended in our research). This is the following log on the ERROR.txt:

the output log says

"MIRA warncode: ASCOV_VERY_HIGH Title: Very high average coverage

You are running a genome de-novo assembly and the current best estimation for average coverage is 233x (note that this number can be +/- 20% off the real value). This is a pretty high coverage,higher than the current warning threshold of 80x."

So the question is: How can I, through DOMINO implement those MIRA commands in order to override those warning? Or following MIRA advice, theres any "correct" way to downsize my reads?

2)SPAdes (more speciffically DOMINOs implemented SPADES): So given that I cant work with MIRA, Ive tried the SPAdes route, however Im getting this error message

"Can't use string ("sarcCTAB") as a HASH ref while "strict refs" in use at /home/civanovich/DOMINO/bin/lib/DOMINO.pm line 399, line 16."

Which is odd given that Im using the files created on the DM_clean.pm step. Odder is that if I run my own SPAdes, I have no issue at all! This happens even after running fastq_pair on the DM_clean output files (thinking maybe there was some reads mismatch or something).

Thanks.

Cristóbal I.

JFsanchezherrero commented 4 years ago

Dear Cristobal,

Sorry for the delay. I am a bit out the project right now and I though some of the others might have given you some information, but I guess they haven't.

Basically, just a bit of contest, during the development of DOMINO we implemented MIRA so that DOMINO could use 454 and illumina reads. We were aware of the limitations and handicaps of using MIRA and so we included SPADES as a secondary assembly software only in bigger computer clusters (due to RAM requirements are high).

I wouldn't downsize your reads to use MIRA, the more coverage the better for a good and high quality assembly. Also MIRA, might not be the best assembly strategy these days...

Unfortunately, I have been checking the problem you got and I can't figure it out. I guess it is due to the structure of the data and folder provided for DOMINO. As I mention, I am out of the project for a long time.

I suggest, as long as you mentioned that you could run SPADES out of DOMINO, that you assemble your reads foreach sample using SPADES, this version or the latest available. Then, you can proceed to the following step in the marker discovery process using the script DM_MarkerScan_v1.1.pl to map reads to a reference and identify markers. You will provide trimmed reads and contigs for each sample accordingly identified as mentioned in the instructions

See additional details for parameters within DM_MarkerScan_v1.1.pl : -option user_assembly_contigs -user_contig_files file XX -user_cleanRead_files YY.

Please come back to us for any clarification or further details Thank you very much for using DOMINO.