macmanes-lab / Oyster_River_Protocol

Official Repository of the Oyster River Protocol for Transcriptome Assembly
Creative Commons Zero v1.0 Universal
16 stars 14 forks source link

Pipeline crashed at transabyss #30

Open kokyriakidis opened 5 years ago

kokyriakidis commented 5 years ago

Here is the CMD output:

CHECKPOINT: Unitig assembly completed.
CMD: bash -euo pipefail -c 'abyss-pe graph=adj --directory=/home/orp/Oyster_River_Protocol/DATA/assemblies/SRR5330501.transabyss k=32 name=SRR5330501.transabyss.fasta j=20 in="/home/orp/Oyster_River_Protocol/DATA/rcorr/SRR5330501.TRIM_1P.cor.fq /home/orp/Oyster_River_Protocol/DATA/rcorr/SRR5330501.TRIM_2P.cor.fq" l=32 s=32 n=2 SIMPLEGRAPH_OPTIONS="--no-scaffold" OVERLAP_OPTIONS="--no-scaffold" MERGEPATH_OPTIONS="--greedy" SRR5330501.transabyss.fasta-6.fa'
The minimum coverage of single-end contigs is 2.
The minimum coverage of merged contigs is 2.
warning: the seed-length should be at least twice k: k=32, s=32
Building the suffix array...
Building the Burrows-Wheeler transform...
Building the character occurrence table...
Mateless   52894022  100%
Unaligned         0
Singleton         0
FR                0
RF                0
FF                0
Different         0
Total      52894022
abyss-fixmate: error: All reads are mateless. This can happen when first and second read IDs do not match.
error: `SRR5330501.transabyss.fasta-3.hist': No such file or directory

Everything until then went fine! Do you know what the problem is?

The reads ID is this:

@MG00HS05:491:C7450ACXX:4:1101:1240:2223_forward/1
and
@MG00HS05:491:C7450ACXX:4:1101:1240:2223_reverse/2

These files were produced from fastq-dump. It seems that the problem is the naming, should be identical. So I should remove the forward and reverse part. Do you have a simple way to do that?

Can i restart the pipeline from the checkpoint above? Or do I have to run it from the start?

AdamStuckert commented 5 years ago

Hi @kokyriakidis,

It looks like you are right and this error is occurring because it expects the same exact header before the /1 or /2 (see: https://github.com/bcgsc/abyss/wiki/ABySS-Users-FAQ).

You can try this fix. Just replace the read1/2 and output file names for your purposes.

sed "s/_forward//g" $READ1 > new_reads.1.fq
sed "s/_reverse//g" $READ2 > new_reads.2.fq

The ORP is checkpointed so you should be able to just restart it and it will pick up at the last checkpoint it passed. Note that you will want to rename the *TRIM_*P.cor.fq reads or it will resume and give you the same error.

macmanes commented 5 years ago

for the checkpointing to work, you'll have to trick the software into thinking that the other assemblies have been made with the "new" reads

touch assemblies/*fasta

before rerunning the ORP, but after you change the reads as per above.