sebhtml / ray

Ray -- Parallel genome assemblies for parallel DNA sequencing
http://denovoassembler.sf.net
Other
65 stars 12 forks source link

Cannot use contigs from another assembler #163

Open yaximik opened 11 years ago

yaximik commented 11 years ago

Contigs were created by minia with max contig length 16091 nt. When these contigs (fasta) were used as input to Ray 2.1.0 along with other PE and SE reads (fastq), Ray exited with segmentation error. When minia contigs were not included, the same data set was assembled successfully with max contig/scaffold length of 46428 nt. I am not sure it is a bug or not, but think it would be helpful if contigs from other assemblers could be included. Here is last output to stdout:

........ Loader::load] File: /media/FantomHD/Data/MiSeq/SC/AdQ30/SC-MILLib1-Herc2s10cFr1Fr2run2R1AdQ30.fastq (please wait...) [Loader::load] File: /media/FantomHD/Data/MiSeq/SC/AdQ30/SC-MILLib1-Herc2s10cFr1Fr2run2R1AdQ30.fastq (please wait...) [Loader::load] File: /media/FantomHD/Data/MiSeq/SC/AdQ30/SCPfx3s25cFr3-150-200run1R1AdQ30.fastq (please wait...) [Loader::load] File: /media/FantomHD/Data/MiSeq/SC/AdQ30/SCPfx3s25cFr3-150-200run1R1AdQ30.fastq (please wait...) [Loader::load] File: /media/FantomHD/Data/MiSeq/SC/AdQ30/SCPfx3s25cFr3-150-200run2R1AdQ30.fastq (please wait...) [Loader::load] File: /media/FantomHD/Data/MiSeq/SC/AdQ30/SCPfx3s25cFr3-150-200run2R1AdQ30.fastq (please wait...) [Loader::load] File: /media/FantomHD/AssRefMap/SC/SCold/SColdAll.fasta (please wait...) [Loader::load] File: /media/FantomHD/AssRefMap/SC/SCold/SColdAll.fasta (please wait...) [Loader::load] File: /media/FantomHD/AssRefMap/SC/SCold/SCallSanger.fasta (please wait...) [Loader::load] File: /media/FantomHD/AssRefMap/SC/SCold/SCallSanger.fasta (please wait...) [Loader::load] File: /home/yaximik/AssRefMap/SC/minia/SCMiSeqAllFGMGPGIGclean_k27.contigs.fasta (please wait...) [G5NNJN1:07040] * Process received signal * [G5NNJN1:07040] Signal: Segmentation fault (11) [G5NNJN1:07040] Signal code: (128)

[G5NNJN1:07040] Failing at address: (nil)

mpiexec noticed that process rank 0 with PID 7040 on node G5NNJN1 exited on signal 11 (Segmentation fault).

sebhtml commented 11 years ago

The problem is due to a lack of support for multi-line fasta files.

=> code/plugin_SequencesLoader/FastaLoader.cpp

yaximik commented 11 years ago

Does this show where problem is (the code segment) or points to a replacement file with the fix? If the problem is in this code segment, I presume it can be circumvented by eliminating newlines from fasta entries (if I can find a tool or create a script).

sebhtml commented 11 years ago

The problem has to be fixed in the code. I just pointed (to myself) where the change need to be done.

I will look into that once I have closed these issues: #153 #146 #136

So you should reopen the issue.

sebhtml commented 11 years ago

Evaluation: 5 human-hours

sebhtml commented 11 years ago

The user used v2.1.0. Last stable is v2.2.0.

sebhtml commented 11 years ago

Merge this patch in master:

wget https://raw.github.com/sebhtml/patches/master/ray/Ray-v2.2.0-fix-for-amos-and-long-reads.patch

sebhtml commented 11 years ago

To fix this, multi-line fasta support is required.

sebhtml commented 11 years ago

https://github.com/sebhtml/ray/commit/20d20c1cef4136281b46e14115631a2326920550

gringer commented 11 years ago

Just in case it helps, here's my Java code for loading multi-line fasta/fastq files:

try {
  System.err.print("Loading from file...");
  boolean isFastQ = false;
  BufferedReader f = new BufferedReader(new FileReader(fileName));
  String nextLine = f.readLine();
  if(nextLine.startsWith("@")){
    isFastQ = true;
  }
  while(nextLine != null){
    String sequenceName = nextLine.substring(1);
    sequenceIDs.put(nextSequenceID++, sequenceName);
    StringBuilder dnaSeq = new StringBuilder();
    nextLine = f.readLine();
    while((nextLine != null) && (!nextLine.startsWith(isFastQ ? "+" : ">"))){
      dnaSeq.append(nextLine);
      nextLine = f.readLine();
    }
    if(isFastQ){
      StringBuilder qual = new StringBuilder();
      do {
        nextLine = f.readLine();
        qual.append(nextLine);
      } while(qual.length() < dnaSeq.length());
      nextLine = f.readLine();
    }
    addSequence(dnaSeq);
  }
  f.close();
  System.err.println(" done!");
} catch (FileNotFoundException e) {
  e.printStackTrace();
  throw(new RuntimeException("File not found"));
} catch (IOException e) {
  e.printStackTrace();
  throw(new RuntimeException("IO exception reading or closing file"));
}

In human-readable terms:

sebhtml commented 11 years ago

Thanks !