molevol-ub / DOMINO

Development of molecular markers in non-model organisms
GNU General Public License v3.0
4 stars 3 forks source link

wrong read pair when use -DM discovery #1

Closed linzhi2013 closed 7 years ago

linzhi2013 commented 7 years ago

Hi there, my command is: perl /home/DOMINO/bin/DM_MarkerScan_v1.0.1.pl -option user_assembly_contigs -type_input pair_end -o test/ -taxa_names spA,spB,spC -VD 0.01 -CL 40 -VL 400 -CD 1 -SLCD 1e-06 -mp 4 -p 8 -user_contig_files /home/test/data/clean_assembly_id-spA.contigs.fasta -user_contig_files /home/test/data/clean_assembly_id-spB.contigs.fasta -user_contig_files /home/test/data/clean_assembly_id-spC.contigs.fasta -user_cleanRead_files /home/test/data/reads_id-spA.clean.R1.fastq -user_cleanRead_files /home/test/data/reads_id-spA.clean.R2.fastq -user_cleanRead_files /home/test/data/reads_id-spB.clean.R1.fastq -user_cleanRead_files /home/test/data/reads_id-spB.clean.R2.fastq -user_cleanRead_files /home/test/data/reads_id-spC.clean.R1.fastq -user_cleanRead_files /home/test/data/reads_id-spC.clean.R2.fastq -DM discovery

then it gives errors: `Total time for backward call to driver() for mirror index: 00:00:26

[ Mon Oct 10 01:26:56 2016 ] Step took 00 hours, 01 minutes, and 06 seconds

\%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Aligning Reads Individually %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Aligning reads for reads_id-spA.clean.R1.fastq and reads_id-spC.clean.R2.fastq ...

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ERROR !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Exiting the script. Some error happened when calling bowtie for mapping the file reads_id-Conanalus.clean.R1.fastq...

Try 'perl /home/DOMINO/bin/DM_MarkerScan_v1.0.1.pl -h|--help or -man' for more info Exit program.`

Obviously, reads_id-spA.clean.R1.fastq and reads_id-spC.clean.R2.fastq are wrong pair.

Did I do something wrong?

Thank you!

JFsanchezherrero commented 7 years ago

Hi there!

First of all, thank you very much for using DOMINO and helping us improving it.

I think there are two points in here.

I have checked and we used to have some kind of control step when mapping paired-end reads files to avoid what exactly is happening in here (mismatching of paired-end files)...

Aligning reads for reads_id-spA.clean.R1.fastq and reads_id-spC.clean.R2.fastq ...

But we discarded when we implemented something new. So, let me work on it and fix it. I will let you know when this is ready.

On the other hand, just a question. You use

-taxa_names spA,spB,spC

but then DOMINO reports and error when using the file reads_id-Conanalus.clean.R1.fastq, I dont know if you just said spA,spB and spC for a matter of simplification or not. You should use the same name in the option -taxa_names and in the ids you provide for the reads and for the contigs.

So, again, let me work on this issue and fix it and I will let you know as soon as possible.

Cheers!

linzhi2013 commented 7 years ago

What a fast response! Thanks a lot!

Yes, I use spA, spB, spC for the simplification of the question, i.e., I replaced some the real species name on Github.

The whole message would be,

my command is:

perl /home/DOMINO/bin/DM_MarkerScan_v1.0.1.pl -option user_assembly_contigs -type_input pair_end -o test/ -taxa_names spA,spB,spC \

-VD 0.01 -CL 40 -VL 400 -CD 1 -SLCD 1e-06 -mp 4 -p 8 \

-user_contig_files /home/test/data/clean_assembly_id-spA.contigs.fasta \

-user_contig_files /home/test/data/clean_assembly_id-spB.contigs.fasta \

-user_contig_files /home/test/data/clean_assembly_id-spC.contigs.fasta \

-user_cleanRead_files /home/test/data/reads_id-spA.clean.R1.fastq -user_cleanRead_files /home/test/data/reads_id-spA.clean.R2.fastq \

-user_cleanRead_files /home/test/data/reads_id-spB.clean.R1.fastq -user_cleanRead_files /home/test/data/reads_id-spB.clean.R2.fastq \

-user_cleanRead_files /home/test/data/reads_id-spC.clean.R1.fastq -user_cleanRead_files /home/test/data/reads_id-spC.clean.R2.fastq \

-DM discovery

then it gives errors:

Total time for backward call to driver() for mirror index: 00:00:26

[ Mon Oct 10 01:26:56 2016 ] Step took 00 hours, 01 minutes, and 06 seconds

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Aligning Reads Individually %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Aligning reads for reads_id-spA.clean.R1.fastq and reads_id-spC.clean.R2.fastq ...

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ERROR !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Exiting the script. Some error happened when calling bowtie for mapping the file reads_id-spA.clean.R1.fastq...

Try 'perl /home/DOMINO/bin/DM_MarkerScan_v1.0.1.pl -h|--help or -man' for more info

Exit program.

Thank you very much!

Best wishes,

Guanliang

From: Jose Francisco Sanchez-Herrero notifications@github.com Reply-To: molevol-ub/DOMINO reply@reply.github.com Date: Tuesday, 11 October 2016 at 17:40 To: molevol-ub/DOMINO DOMINO@noreply.github.com Cc: linzhi2012 linzhi2012@gmail.com, Author author@noreply.github.com Subject: Re: [molevol-ub/DOMINO] wrong read pair when use -DM discovery (#1)

Hi there!

First of all, thank you very much for using DOMINO and helping us improving it.

I think there are two points in here.

I have checked and we used to have some kind of control step when mapping paired-end reads files to avoid what exactly is happening in here (mismatching of paired-end files)...

Aligning reads for reads_id-spA.clean.R1.fastq and reads_id-spC.clean.R2.fastq ...

But we discarded when we implemented something new. So, let me work on it and fix it. I will let you know when this is ready.

On the other hand, just a question. You use

-taxa_names spA,spB,spC

but then DOMINO reports and error when using the file reads_id-Conanalus.clean.R1.fastq, I dont know if you just said spA,spB and spC for a matter of simplification or not. You should use the same name in the option -taxa_names and in the ids you provide for the reads and for the contigs.

So, again, let me work on this issue and fix it and I will let you know as soon as possible.

Cheers!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

JFsanchezherrero commented 7 years ago

Hi there!

Just discover something while checking the code.

As I said, we discarded this checking because we hope that the user would provide the file appropriately tagged. So, for the current version you are using, if you provide your files as:

One input file per taxa named as "[xxid-][yyy][_Rn].fastq" or "[yyy][_Rn].fastq". Where: '[xxid-]' might be present or not. [xx] could be any character (or none). Please avoid using dots (.) '[yyy]' taxon identifier. [Mandatory] '[_Rn]' If paired-end data, R1 or R2, for the left and the right reads, respectively.

In the example you were using before, you should provide files as: reads_id-spC_R1.fastq, please take into account that "spC" is the taxon identifier and "_R" is the paired-end read identifier, but both "_" and "R1 or R2" are mandatory.

If providing files such as _readsid-spC.clean.R1.fastq, then tags in taxanames option should be -taxanames spC.clean,spA.clean etc... but this wont be the case i am afraid, your taxa is named as spC or spA not spA.clean.

Thanks to this issue and the comments you provide us, we have seen that there are maybe some points not clearly explained in the documentation. We will work on improving the documentation but also in controlling some of these bugs generated.

Let me know if it worked for you to close this issue. Again, thank you very much for your comments and for helping us improving DOMINO.

Any other suggestion or any other issue feel free to comment.

Cheers

Jose F.

linzhi2013 commented 7 years ago

Hi there!

Thank you! It seems to be working now (still running, but print out correct message):

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Aligning Reads Individually %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Aligning reads for reads_id-spA_R1.fastq and reads_id-spA_R2.fastq ...

Yes, I have read the document. And when I run the program for the first time, I just copied the commands from the help information (perl DM_MarkerScan_v1.0.1.pl –h). Thus, I would suggest that you can update the help information then it would be perfect, as it seems to have some problems:

Thank you!

Guanliang

From: Jose Francisco Sanchez-Herrero notifications@github.com Reply-To: molevol-ub/DOMINO reply@reply.github.com Date: Tuesday, 11 October 2016 at 19:00 To: molevol-ub/DOMINO DOMINO@noreply.github.com Cc: linzhi2012 linzhi2012@gmail.com, Author author@noreply.github.com Subject: Re: [molevol-ub/DOMINO] wrong read pair when use -DM discovery (#1)

Hi there!

Just discover something while checking the code.

As I said, we discarded this checking because we hope that the user would provide the file appropriately tagged. So, for the current version you are using, if you provide your files as:

One input file per taxa named as "[xxid-][yyy][_Rn].fastq" or "[yyy][_Rn].fastq". Where: '[xxid-]' might be present or not. [xx] could be any character (or none). Please avoid using dots (.) '[yyy]' taxon identifier. [Mandatory] '[_Rn]' If paired-end data, R1 or R2, for the left and the right reads, respectively.

In the example you were using before, you should provide files as: reads_id-spC_R1.fastq, please take into account that "spC" is the taxon identifier and "R" is the paired-end read identifier, but both "" and "R1 or R2" are mandatory.

If providing files such as reads_id-spC.clean.R1.fastq, then tags in taxa_names option should be -taxa_names spC.clean,spA.clean etc... but this wont be the case i am afraid, your taxa is named as spC or spA not spA.clean.

Thanks to this issue and the comments you provide us, we have seen that there are maybe some points not clearly explained in the documentation. We will work on improving the documentation but also in controlling some of these bugs generated.

Let me know if it worked for you to close this issue. Again, thank you very much for your comments and for helping us improving DOMINO.

Any other suggestion or any other issue feel free to comment.

Cheers

Jose F.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

JFsanchezherrero commented 7 years ago

Great Guanliang!!

Bear in mind it might take time...although lately implementations of Bowtie indexing of the reference the mapping and the development of markers is always tricky. But as long as it is running...

We will be updating new implementations on ram usage and cpu parallelization.

Thank you very much

Jose F.