Open civanovich-senck opened 4 years ago
Hi there Cristobal,
Definitely, something is going wrong here. Let's work it out.
I am sorry but I can not understand exactly what do you mean with: "Which doesnt make sense as started the fasta assemblies from SPAdes, and also tried converting these fastas to .fq, but I still get that."
Do you mean you provide DOMINO marker script with fasta assemblies (assembled elsewhere) and fastq reads or what exactly?
Provide us with the full log details and full command call. You can mask full path if desired from files or send it to me via email if you prefer so.
I have just seen you previously opened an issue (#12). Check the details to provide reads and contigs assembled.
Thanks
Hi José,
What I meant was that Im using clean_reads coming out from Dominos + assemblies I did with these clean reads on SPAdes. As you saw on my previous issue, I was never truly able to run the assembly portion of the pipeline. Now what I also did was to force a convertion of the assemblies, from .fasta to .fq in order to bypass the "Get FASTQ files of the contigs generated" flag.
a command example:
perl "/fslgroup/fslg_Lecanomics/DOMINO/bin/DM_MarkerScan_v1.1.pl" -option user_assembly_contigs -type_input pair_end -o "/zhome/fslcollab260/DOMINO_output_MarkerDisc/" -taxa_names berm9,bermCTAB,cadu255,cadu255B,carp385,disp377,intm388,lec391,mdeus387,pmur380,poly381,rupi384,sarc11,sarcC,sarcJ,subcar389,subint237,subint237B,var239 -VD -1 -CL 1 -VL 10 -CD 50 -SLCD 1e-05 -MCT 1 -rdgopen 5 -rdgexten 3 -mp 4 -max_SoftClipping 10 -MPA 0.1 -SI 50 -p 28 -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_berm9_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_bermCTAB_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_cadu255_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_cadu255B_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_carp385_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_disp377_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_intm388_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_lec391_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_mdeus387_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_pmur380_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_poly381_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_rupi384_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_sarc11_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_sarcC_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_sarcJ_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_subcar389_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_subint237_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_subint237B_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_var239_contigs.fasta" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-berm9_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-berm9_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-bermCTAB_R1.fastq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-bermCTAB_R2.fastq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-cadu255_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-cadu255_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-cadu255B_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-cadu255B_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-carp385_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-carp385_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-disp377_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-disp377_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-intm388_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-intm388_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-lec391_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-lec391_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-mdeus387_R1.fq" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-mdeus387_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-pmur380_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-pmur380_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-poly381_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-poly381_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-rupi384_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-rupi384_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-sarc11_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-sarc11_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-sarcC_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-sarcC_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-sarcJ_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-sarcJ_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-subcar389_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-subcar389_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-subint237_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-subint237_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-subint237B_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-subint237B_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-var239_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-var239_R2.fq" -DM discovery
and its output: slurm-39087745.txt
Precissely now im testing the assembly pipeline again, on another supercomputing server, lets see what comes out of that.
Thanks, Cristóbal
Hi Cristobal,
I still cannot get what you mean with "force a convention of the assemblies from .fasta to .fq in order to bypass...".
What I guess is happening here is a misunderstanding with terms. Basically, what DOMINO does is to map sequencing reads (R1 & R2 or single end) to a reference, either the closest genome provided (by user) or the assemblies previously generated.
To be clear, lets make an example. I would do it simple and just use 3 samples: Dmelanogaster, Dsimulans and Dyakuba. As sequencing read files. I would have:
Once cleaned, these reads would be renamed to:
Imagine we do not have a close and well assembled reference. We would need to create assemblies for each taxa. After the assembly, I would have:
Now, for the marker discovery, we need to provide DOMINO with clean sequencing reads AND assembled contigs. Both are required and mandatory (only under a specific circunstance it is not, but here and for this example I would not enter into details).
This would be the command:
perl DM_MarkerScan_v1.1.pl -option user_assembly_contigs -type_input pair_end -o test/
-taxa_names Dmelanogaster,Dsimulans,Dyakuba -VD 0.01 -CL 40 -VL 400 -CD 1 -SLCD 1e-06 -mp 4
-user_contig_files path_to_file1/clean_assembly_id-Dmelanogaster.contigs.fasta
-user_contig_files path_to_file2/clean_assembly_id-Dsimulans.contigs.fasta
-user_contig_files path_to_file3/clean_assembly_id-Dyakuba.contigs.fasta
-user_cleanRead_files reads_id-Dmelanogaster.clean.R1.fastq -user_cleanRead_files reads_id-Dmelanogaster.clean.R2.fastq
-user_cleanRead_files reads_id-Dsimulans.clean.R1.fastq -user_cleanRead_files reads_id-Dsimulans.clean.R2.fastq
-user_cleanRead_files reads_id-Dyakuba.clean.R1.fastq -user_cleanRead_files reads_id-Dyakuba.clean.R2.fastq
-DM discovery
So, I can see your command is correct but I am afraid you misunderstood something an not providing reads but contigs that you renamed.
If this is not the case and you proceed correctly, provide me with the details in files: DOMINO_dump_information.txt under mapping folder generated. No mapping is done because this folder is either empty or something else is happening.
Thanks
Hi José
I still cannot get what you mean with "force a convention of the assemblies from .fasta to .fq in order to bypass...".
what I did (after failing so hard with normal files), was that through a script, I transformed some of my contigs.fasta to contigs.fq by giving them some fake scores, in order to bypass the Get FASTQ files of the contigs generated
flag.
What I guess is happening here is a misunderstanding with terms. Basically, what DOMINO does is to map sequencing reads (R1 & R2 or single end) to a reference, either the closest genome provided (by user) or the assemblies previously generated
No. I am using the clean reads generated by DOMINO, and the assemblies I generated on SPAdes, using these same clean reads. For example for taxa berm9:
/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_berm9_contigs.fasta" -user_contig_files
-> contig done with SPAdes, based on the cleaned reads outputted by DOMINOs
user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-berm9_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-berm9_R2.fq"
-> DOMINOs cleaned pair of reads.
some files generated in the mapping folder: 202011200742_Mapping-Parameters.txt DOMINO_dump_param.txt
The file DOMINO_dump_information for this run is empty!
Another thing that want to bring to attention regarding the DM_Clean, is an issue with the numbers of cores command call -p : seems to be that despite my flag for using 28 cores, the pipeline defaults to use 2. This has been a problem the IT guy at the supercomputer had brough to my attention. Quoting from him
It's possible that the problem just doesn't scale well, or perhaps multiple cores are only used in a small part of the workload, in which case just using a few cores is probably the way to go.
any clues on this issue?
Cristóbal
Hi there Cristobal,
I guess I found a bug! DOMINO is not correctly processing external assemblies and reads. I will come with a solution in a few days. I will let you know as soon as possible.
Regarding the topic of transforming contigs.fasta to contigs.fq,
what I did (after failing so hard with normal files), was that through a script, I transformed some of my contigs.fasta to contigs.fq by giving them some fake scores, in order to bypass the Get FASTQ files of the contigs generated flag.
I can see what you did but I can NOT understand the purpose of it. What was the point? Are you using them finally? Was it just a desperate action for the message "Get FASTQ files of the contigs generated". Maybe it is wrong written and it should not be. It might be appropriate to re-write it as "Get FASTQ files that assembled the contigs".
Finally, regarding the CPU implementation. It might happen as the guy in IT mentioned that in some steps of the process, the total amount of CPUs provided are not fully used. In some cases, it is difficult to implement threads especially in cases where no parallel solution is provided by third parties and in other cases there are limitations in data processing.
I can assure that for most of the mapping and marker discovery steps, threads are implemented and for most of the time fully working. Anyhow, it might be appropriate to set a number of CPU according to your system disponibility and other users workload.
I will came back with a solution. Thanks for the detailed information provided and comments.
Have a nice day
Hi José
Was it just a desperate action for the message "Get FASTQ files of the contigs generated".
yep, basically.
Anyhow, it might be appropriate to set a number of CPU according to your system disponibility and other users workload
Basically for me has been eyeballing number of cores and ram. I had to migrate from our supercomputers here in frankfurt to the computing resources at utah, because I was overusing disk space, ram and computing time with DOMINO (which also lead to some very stern calls and emails from the former IT researcher). It is an odd thing this situation because I was taking aprox. 10 hrs for a DM_Clean run of 4 reads, plus added reference genomes as database for mapping, on 12 cores, whereas in the american server my tests lead from 16 to 14 hrs on 28 cores.
Stay safe!
Hi Cristobal,
Sorry for the delay.
I have been working today in this issue, but it does work for me. I try something similar to what you did. I use DOMINO to clean and trimm reads, I used spades externally to assemble reads, I renamed files and then I use DOMINO marker to create markers. The command was:
perl ../bin/DM_MarkerScan_v1.1.pl -option user_assembly_contigs -type_input single_end -o test/ -taxa_names sp1,sp2,sp3,sp4 -VD 0.01 -CL 40 -VL 400 -CD 1 -SLCD 1e-06 -mp 4 -user_contig_files ./assembly_spades/rename/clean_assembly_id-sp1.contigs.fasta -user_contig_files ./assembly_spades/rename/clean_assembly_id-sp2.contigs.fasta -user_contig_files ./assembly_spades/rename/clean_assembly_id-sp3.contigs.fasta -user_contig_files ./assembly_spades/rename/clean_assembly_id-sp4.contigs.fasta -user_cleanRead_files ./test/202012081052_DM_clean_data/QC-filtered_id-sp1.fastq -user_cleanRead_files ./test/202012081052_DM_clean_data/QC-filtered_id-sp2.fastq -user_cleanRead_files ./test_example/test/202012081052_DM_clean_data/QC-filtered_id-sp3.fastq -user_cleanRead_files ./test_example/test/202012081052_DM_clean_data/QC-filtered_id-sp4.fastq -DM discovery
I have tried using the 4 fastq reads provided within the example. I haven't tested paired-end but I doubt the problem is in there. Can you try and clean all previous old folders? All files and folders generated such as 2020...Mapping/Mapping_old_xx/Markers, etc.
Try using just a couple of assemblies and 4-5 samples. Let me know what happens.
About the performance difference I am afraid is a common issue between different computer nodes. Each node has a given RAM, CPU type and capacity and these differences generate (not only for DOMINO) differences in perfomance. For example, I have recently encountered differences up to 50x in time for a simple awk process (2 million gunzip line) ranging from 10 minutes in one node to 12 hours in other nodes.
Best,
Ok, I might have found where the problem is. Check this out.
This is a summary of your previous command
perl "/fslgroup/fslg_Lecanomics/DOMINO/bin/DM_MarkerScan_v1.1.pl" -option user_assembly_contigs -type_input pair_end -o "/zhome/fslcollab260/DOMINO_output_MarkerDisc/" -taxa_names berm9,bermCTAB,cadu255,cadu255B,carp385,disp377,intm388,lec391,mdeus387,pmur380,poly381,rupi384,sarc11,sarcC,sarcJ,subcar389,subint237,subint237B,var239 -VD -1 -CL 1 -VL 10 -CD 50 -SLCD 1e-05 -MCT 1 -rdgopen 5 -rdgexten 3 -mp 4 -max_SoftClipping 10 -MPA 0.1 -SI 50 -p 28 -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_berm9_contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id_bermCTAB_contigs.fasta" ... -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-berm9_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-bermCTAB_R1.fastq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-bermCTAB_R2.fastq" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-mdeus387_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-pmur380_R1.fq" ... -DM discovery
First of all, contig names should have _xxxid-name.contigs.fasta. Take into the character "_" and "-". You did not use them correctly.
It is Ok for clean reads.
In the middle of clean reads, you include a user_contig_files entry with reads. This one might be generating the problem!
Take this into account!
Let me know what happen using these new fixed settings.
Thanks,
Best
Hi José,
Ok, thing are advancing. I completely missed that underscore and also the underscore on _contigs.fasta. Now the output is still not generating markers, but Ive saw that bowtie is not being called correctly:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Aligning Reads Individually %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ERROR !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Exiting the script. Some error happened when calling bowtie for mapping the file /lustre/scratch/grp/fslg_Lecanomics/clnreads_id-bermCTAB_R1.fastq...
Try 'perl /zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/DM_MappingReads.pl -h|--help or -man' for more information. Exit program.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! [ Wed Dec 9 06:14:57 2020 ]
Try perl /zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/DM_MappingReads.pl -man for more information
Also on one of my test, the output is basically showiong me the same bowtie problem, plus 50 megas worth o this message:
###################### Fetching information from all the PROFILEs generated #####################
- Checking profiles of variation for each contig and merging information...
- Using a sliding window approach...
- Using parallel threads (12 CPUs)...
- Dataset would be splitted for speeding computation into 400 subsets... Use of uninitialized value in concatenation (.) or string at /zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/DM_MarkerDiscovery.pl line 310.
This message appears after mothur is called.
Thanks for taking the time on this.
Cristóbal
Hi there,
Can you provide me with the full command you send?
Make sure you clean all previous markers and mapping folders generated.
Set the option --debug
and provide me with log and error details, use a txt file if necessary.
Let's see what is going on and fix it!
Regards
Hi José,
Comando:
perl "/fslgroup/fslg_Lecanomics/DOMINO/bin/DM_MarkerScan_v1.1.pl" -option user_assembly_contigs -type_input pair_end -o "/zhome/fslcollab260/DOMINO_output_MarkerDisc/" -taxa_names berm9,bermCTAB,cadu255,cadu255B,carp385,disp377,intm388,lec391,mdeus387,pmur380,poly381,rupi384,sarc11,sarcC,sarcJ,subcar389,subint237,subint237B,var239 -VD 0.01 -CL 40 -VL 400 -CD 1 -SLCD 1e-06 -mp 4 -p 12 --debug -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-berm9.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-bermCTAB.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-cadu255.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-cadu255B.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-carp385.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-disp377.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-intm388.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-lec391.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-mdeus387.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-pmur380.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-poly381.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-rupi384.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-sarc11.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-sarcC.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-sarcJ.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-subcar389.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-subint237.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-subint237B.contigs.fasta" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/assemblies/id-var239.contigs.fasta" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-berm9_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-berm9_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-bermCTAB_R1.fastq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-bermCTAB_R2.fastq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-cadu255_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-cadu255_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-cadu255B_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-cadu255B_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-carp385_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-carp385_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-disp377_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-disp377_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-intm388_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-intm388_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-lec391_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-lec391_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-mdeus387_R1.fq" -user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-mdeus387_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-pmur380_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-pmur380_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-poly381_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-poly381_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-rupi384_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-rupi384_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-sarc11_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-sarc11_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-sarcC_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-sarcC_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-sarcJ_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-sarcJ_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-subcar389_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-subcar389_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-subint237_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-subint237_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-subint237B_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-subint237B_R2.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-var239_R1.fq" -user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-var239_R2.fq" -DM discovery
Attached are some of the output files
202012100555_Mapping_ERROR.txt 202012100555_Markers-Parameters.txt DOMINO_dump_information.txt
Cheers, Cristóbal
Hi there,
I have checked DOMINO logs and additional information provided. It seems there is a problem with bowtie calling.
We are going to check the version you are using within DOMINO. Can you execute the following script and provide me with the output.
perl bin/scripts/DM_DOMINO_dependencies.pl
On the other hand, just one comment for your command. Fix for sample mdeus387
the input R2 string:
-user_cleanRead_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-mdeus387_R1.fq"
-user_contig_files "/fslhome/fslcollab260/fsl_groups/fslg_Lecanomics/compute/clnreads_id-mdeus387_R2.fq"
Change user_contig_files
to user_cleanRead_files
. I dont think this would be the problem, at least, not that early in the process.
Can you re-run the perl DOMINO command (--debug
mode ON) with the last issue fixed and provide with the previous files (_Mapping_ERROR.txt, _Markers-Parameters.txt, DOMINO_dump_information.txt, slurm-*.txt).
Also, provide me if any of the files generated within /zhome/fslcollab260/DOMINO_output_MarkerDisc/**_DM_mapping/berm9
as it seems to be the first analyzed and the one providing problems.
Thanks in advance!
Best,
Hi,
the DM_DOMINO_dependencies.pl:
`################################################################################################# ############################################ MODULES ############################################ #################################################################################################
Checking perl module dependencies...
Checking module: Getopt::Long....................Getopt/Long.pm [OK]
Checking module: Pod::Usage....................Pod/Usage.pm [OK]
Checking module: Data::Dumper....................Data/Dumper.pm [OK]
Checking module: POSIX....................POSIX.pm [OK]
Checking module: FindBin....................FindBin.pm [OK]
Checking module: DOMINO....................DOMINO.pm [OK]
Checking module: File::Copy....................File/Copy.pm [OK]
Checking module: File::Find;.................... [X]
ATTENTION: File/Find; is missing but DOMINO might still work appropiate...]
Checking module: List::Uniq....................List/Uniq.pm [OK]
Checking module: File::Path....................File/Path.pm [OK]
Checking module: Cwd....................Cwd.pm [OK]
Checking module: Parallel::ForkManager....................Parallel/ForkManager.pm [OK]
Checking module: Spreadsheet::WriteExcel....................Spreadsheet/WriteExcel.pm [OK]
Checking module: Time::HiRes....................Time/HiRes.pm [OK]
Checking module: List::Util....................List/Util.pm [OK]
################################################################################################# ############################################ BINARIES ########################################### #################################################################################################
Checking binary dependencies from other sources...
Checking BLAST:... /zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/../NCBI_BLAST/
Checking bowtie2 v2.2.9:... /zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/../bowtie2-2.2.9/
Checking samtools v1.3.1:... /zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/../samtools-1.3.1/samtools
Checking mothur v1.32:... /zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/../MOTHUR_v1.32.0/mothur
Checking CAP3:... /zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/../cap3/bin/cap3
Checking MIRA:... /zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/../mira_v4.0/bin/mira
################################################################################################# ############################################# UTILS ############################################# #################################################################################################
Checking perl scripts...
Checking..................../zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/DM_GeneratePileup.pl syntax OK
Checking..................../zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/DM_DOMINO_dependencies.pl syntax OK
Checking..................../zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/DM_MarkerSliding.pl syntax OK
Checking..................../zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/DM_PrintExcel.pl syntax OK
Checking..................../zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/DM_MarkerValidate.pl syntax OK
Checking..................../zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/DM_MappingReads.pl syntax OK
Checking..................../zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/DM_MarkerClusterize.pl syntax OK
Checking..................../zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/DM_MarkerOverlap.pl syntax OK
Checking..................../zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/DM_runSPAdes.pl syntax OK
Checking..................../zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/DM_runMIRA.pl syntax OK
Checking..................../zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/DM_ContigStats.pl syntax OK
Checking..................../zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/DM_MarkerDiscovery.pl syntax OK
Checking..................../zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/../DM_Assembly_v1.1.pl syntax OK
Checking..................../zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/../DM_Clean_v1.1.pl syntax OK
Checking..................../zgrouphome/fslg_Lecanomics/DOMINO/bin/scripts/../DM_MarkerScan_v1.1.pl syntax OK
and the files requested:
202012110739_Mapping_ERROR.txt 202012110739_Markers-Parameters.txt DOMINO_dump_information.txt slurm-39273687.txt reference_berm9.rev.1.bt2.gz reference_berm9.1.bt2.gz reference_berm9.4.bt2.gz reference_berm9.rev.2.bt2.gz reference_berm9.2.bt2.gz contigs_berm9_length.txt.gz reference_berm9.3.bt2.gz reads_bermCTAB-reference_berm9_mapping_logfile.txt.gz reads_bermCTAB-reference_berm9_logfile.txt.gz index_genome_reference_berm9.success.gz reference_berm9-taxa_bermCTAB.sam.gz mapping_bermCTAB.failed.gz mapping_ref_berm9.success.gz
Im also in contact with the IT people at the supercomputer, seems to be that I have also a perl5 issue, and I commented them about bowtie misbehaving.
Cheers, Cristóbal
`
Hi there,
I think I have found it and fix it.
The problem was not related to bowtie. There was a problem the way perl rounds float numbers within DOMINO. We will split CPUs provided to maximize and optimize jobs to run. We used to split CPUs provided and generated a number of CPUs to use by doing number_CPUs/number_species_to_map
but we did not take into account if value is <0.5. Than 0 CPUs were provided for each job.
I have updated the code for DM_MappingReads.pl
script. You should update your version too by doing:
git pull https://github.com/molevol-ub/DOMINO.git
cp src/perl/scripts/DM_MappingReads.pl bin/scripts/DM_MappingReads.pl
Give it a try and let me known how it works.
P.S. If you need further help to update the code, let me know.
Hi José, So after updating the script, I runned DOMINO over the weekend. Seems to be that bowtie is working fine, but I see errors flag when calling samtools. And no markers discovered.
Attached are some files of the last run, and a ss of the cpu usage stats. Seems to be that DOMINO doesnt handle job division by nodes?
Cheers, Cristóbal
202012121443_Mapping_ERROR.txt 202012121443_Markers-Parameters.txt DOMINO_dump_information.txt slurm-39296113.txt
Dear Cristobal,
Sorry for the delay. I have been very busy lately and I haven't checked it already.
We have limited funds so far for this project and it is difficult to solve issues. I would try to have a look in the following days.
Best wishes
Dear Developers,
After running my 20 genomes with the Markers Discovery pipeline around a 100 times (trying different parameters combos), I always ended with the same result: "No markers identified" Even with genomes belonging to the same species, the pipeline find nothing. Another weird thing that happens is that the whole analysis takes between 0 to 3 secs, so no mapping step is happening I guess. Also I always get this flag:
%%%%%%%%%%%%%%%%%%%%%%%%%%%% Get FASTQ files of the contigs generated %%%%%%%%%%%%%%%%%%%%%%%%%%%
Which doesnt make sense as started the fasta assemblies from SPAdes, and also tried converting these fastas to .fq, but I still get that. an example of parameters im using :-VD -1 -CL 1 -VL 10 -CD 50 -SLCD 1e-05 -MCT 1 -rdgopen 5 -rdgexten 3 -mp 4 -max_SoftClipping 10 -MPA 0.1 -SI 50 -p 28
output example :
please help!
Cristóbal