molevol-ub / DOMINO

Development of molecular markers in non-model organisms
GNU General Public License v3.0
4 stars 3 forks source link

various questions and ERROR: Can not open folder ... at DOMINO/bin/lib/DOMINO.pm line 510. #5

Closed entobento closed 6 years ago

entobento commented 6 years ago

Hi again Jose et al.

Here's a few questions that I've come up with in the process of running the tool. Thank you guys again for all the guidance on this.

1) .allele.loci versus .loci in ipyrad I have been able to successfully run Domino on my ezRAD data set generating 29711 and 31106 markers using two different approaches. The first uses the alleles.loci output from ipyrad as the DOMINO input, with a min number of alleles to produce a marker (=3), in theory this generates markers that have allelic variation. The second approach uses the .loci output from ipyrad to generate markers with loci specific variation. My first question for you is whether or not useing the allele.loci output from ipyrad is appropriate, and whether my understanding of it is sound.

2) ERROR: Can not open folder ... at DOMINO/bin/lib/DOMINO.pm line 510. After running these analyses I then decided to run DOMINO using the cleaned R1 and R2 paired-end reads cleaned in ipyrad and descriptive of pooled ezRAD data. I realize this is not necessarily recommended, but I wanted to compare it to the .loci outputs, and I thought that because the data was generated from pooled sequences it might be a more robust analysis. I attempted to mimic the code such that it mimics the "<User provides a reference genome and reads (single end) -- Discovery>" code, but instead of single end, it is paired_end. In the process of doing this I have also scripted it to align to a contig reference genome.

The code is as follows:

cd /home/
HOMEDIR=/For_Super/DOMINO_test/
TESTDIR=$HOMEDIR"DENOVO_Finch/"

TESTDIR=$HOMEDIR"DENOVO_Finch/"
OUTDIR=$TESTDIR"R1andR2wZebraRef"

awk '{print $1}' /cxfs/Finch/npStatBetweenPopAssessment/ZebraFinchReference.fasta > $TESTDIR"/ZebraFinchReference.fasta-short-header.fa"

perl bin/DM_MarkerScan_v1.0.2_corr.pl -option genome -type_input pair_end -o $OUTDIR -taxa_names NF-A,NF-B,NF-C -CL 30::200 -VP 2::999 -VL 70::400 -MCT 2 -DM discovery -TempFiles -dnaSP -low_coverage_data --polymorphism --number_cpu 10 -genome_fasta $TESTDIR"/ZebraFinchReference.fasta-short-header.fa" -user_cleanRead_files $TESTDIR"NF-A_R1.fastq" -user_cleanRead_files $TESTDIR"NF-A_R2.fastq" -user_cleanRead_files $TESTDIR"NF-B_R1.fastq" -user_cleanRead_files $TESTDIR"NF-B_R2.fastq" -user_cleanRead_files $TESTDIR"NF-C_R1.fastq" -user_cleanRead_files $TESTDIR"NF-C_R2.fastq"

When I run this I get the error: ERROR: Can not open folder ... at DOMINO/bin/lib/DOMINO.pm line 510.

Which corresponds with:

508    sub readDir {
509       my $dir = $_[0];
510       opendir(DIR, $dir) or die "ERROR: Can not open folder $dir..."; ## FIX ADD TO ERROR-LOG

does this analysis make sense? and if so how do I go about troubleshooting the error. I am in all truth stumped as to where that particular section of code leads, but then again I do not code in perl.

Much obliged,

Adam

JFsanchezherrero commented 6 years ago

Dear Adam,

I will provide different answers for each question.

1. .allele.loci versus .loci in ipyrad I am afraid we are not familiar with the allele.loci file format I can assume it is similar .loci. We implemented a modul in DOMINO in order to retrieve informative markers given a multiple alignment format in fasta, phylip or especially designed for RAD/GBS experiments in .loci format. (http://ipyrad.readthedocs.io/output_formats.html?highlight=loci). Basically, DOMINO checks each column and selects informative markers. So, as we are not familiar with the allele.loci format neither the difference with .loci I am not sure what is the point of using one or the other.

2. ERROR: Can not open folder ... at DOMINO/bin/lib/DOMINO.pm line 510. I am afraid there is no point in doing this analysis using DOMINO. Let me explain why.

DOMINO has two main modes: selection and discovery. The selection mode provides user with informative markers fulfilling some characteristics such as an expected minimun variation or taxa covered for each alignment. But the input for this module would be as for the previous question, RAD/GBS in loci format or MSA files in fasta or phylip format. It would only select given alignments with some characteristics of interest. On the other hand, DOMINO implements the module discovery. That module, given a reference, or in default the assembly of some reads, maps and aligns reads of other different taxa to the reference and tries to find regions with two conserved regions flanking a variable region with a given variaton percentage. That module is basically intended to generate PCR primers to perform further analysis in your taxa of interest but also in other related taxa.

I can see there is some potential in what you are doing here but I am afraid it would not be suitable. Also, we are not dealing with pooled sequences and DOMINO would interpret any file as a diploid genome or subset of genome.

So I am afraid, that you would be able to benefit from DOMINO only using the selection module for the selection of informative markers but only once a previous program such as pyRAD or STACKS have been done.

Hope it helps, Please contact for further details, Regards, Jose F

entobento commented 6 years ago

Hi Jose, Thanks for the guidance on this! Best,

Adam