ngless-toolkit / ngless

NGLess: NGS with less work
https://ngless.embl.de
Other
142 stars 24 forks source link

Strange error when QC some samples #91

Closed AlessioMilanese closed 5 years ago

AlessioMilanese commented 6 years ago

Hi, I'm trying to filter reads based on quality and remove human reads, but I have the following error:

Exiting after internal error. If you can reproduce this issue, please run your script with the --trace flag and report a bug at http://github.com/luispedro/ngless/issues
/scratch/milanese/NGless_temp_dir/reads_block_selected_mapped_reference11873-6..1.fq11873-7.gz: renameFile:renamePath:rename: does not exist (No such file or directory)

Some more information: I installed ngless from bioconda (ngless v0.9.1 (release date: July 17 2018)) I'm running this script (/scratch/milanese/TEST/QC/filter-human-merged.ngl):

ngless "0.8"
import "parallel" version "0.6"
import "mocat" version "0.0"

samples = readlines(ARGV[2])
sample = lock1(samples)
input = load_mocat_sample(ARGV[1] + '/' + sample)

input = preprocess(input, keep_singles=True) using |read|:
    read = substrim(read, min_quality=25)
    if len(read) < 45:
        discard
mapped = map(input, reference='hg19')

mapped = select(mapped) using |mr|:
    mr = mr.filter(min_match_size=45, min_identity_pc=90, action={unmatch})
    if mr.flag({mapped}):
        discard

write(as_reads(mapped), ofile=sample+'/'+sample+'.filtered.fq.gz')
collect(qcstats({fastq}), ofile='preprocessing_fqstats.txt', current=sample, allneeded=samples)

submitting it to slurm with the script (/scratch/milanese/TEST/QC/slurm_ex2.sh):

#!/bin/bash
#SBATCH -A zeller
#SBATCH -t 1-00:00
#SBATCH --mem 16G
#SBATCH -n 8
#SBATCH -o /scratch/milanese/TEST/QC/r.out 
#SBATCH -e /scratch/milanese/TEST/QC/r.err

/g/scb2/zeller/milanese/software/ANACONDA/install/bin/ngless /scratch/milanese/TEST/QC/filter-human-merged.ngl /g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156 /scratch/milanese/TEST/QC/list2 -j 8

The file /scratch/milanese/TEST/QC/list2 contains:

CCIS12370844ST-4-0

The directory /g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0 contains:

CCIS12370844ST-4-0_11s002754-1-1_lane5.screened.adapter.screened.hg19.pair.1.fq.gz
CCIS12370844ST-4-0_11s002754-1-1_lane5.screened.adapter.screened.hg19.pair.2.fq.gz
CCIS12370844ST-4-0_11s002754-1-1_lane5.screened.adapter.screened.hg19.singles.fq.gz
CCIS12370844ST-4-0_11s002754-1-1_lane6.screened.adapter.screened.hg19.pair.1.fq.gz
CCIS12370844ST-4-0_11s002754-1-1_lane6.screened.adapter.screened.hg19.pair.2.fq.gz
CCIS12370844ST-4-0_11s002754-1-1_lane6.screened.adapter.screened.hg19.singles.fq.gz
CCIS12370844ST-4-0_11s002754-1-2_lane7.screened.adapter.screened.hg19.pair.1.fq.gz
CCIS12370844ST-4-0_11s002754-1-2_lane7.screened.adapter.screened.hg19.pair.2.fq.gz
CCIS12370844ST-4-0_11s002754-1-2_lane7.screened.adapter.screened.hg19.singles.fq.gz
CCIS12370844ST-4-0_11s002754-1-2_lane8.screened.adapter.screened.hg19.pair.1.fq.gz
CCIS12370844ST-4-0_11s002754-1-2_lane8.screened.adapter.screened.hg19.pair.2.fq.gz
CCIS12370844ST-4-0_11s002754-1-2_lane8.screened.adapter.screened.hg19.singles.fq.gz
CCIS12370844ST-4-0_11s002754-1-3_lane8.screened.adapter.screened.hg19.pair.1.fq.gz
CCIS12370844ST-4-0_11s002754-1-3_lane8.screened.adapter.screened.hg19.pair.2.fq.gz
CCIS12370844ST-4-0_11s002754-1-3_lane8.screened.adapter.screened.hg19.singles.fq.gz

The output that I get using --trace is:

[Fri 26-10-2018 19:56:09]: # Configuration
[Fri 26-10-2018 19:56:09]:  download base URL: http://ngless.embl.de/resources/
[Fri 26-10-2018 19:56:09]:  global data directory: /g/scb2/zeller/milanese/software/ANACONDA/install/bin/../share/ngless/data
[Fri 26-10-2018 19:56:09]:  user directory: /home/milanese/.local/share/ngless
[Fri 26-10-2018 19:56:09]:  user data directory: /home/milanese/.local/share/ngless/data
[Fri 26-10-2018 19:56:09]:  temporary directory: /scratch/milanese/NGless_temp_dir/
[Fri 26-10-2018 19:56:09]:  keep temporary files: False
[Fri 26-10-2018 19:56:09]:  create report: True
[Fri 26-10-2018 19:56:09]:  report directory: /scratch/milanese/TEST/QC/filter-human-merged.ngl.output_ngless
[Fri 26-10-2018 19:56:09]:  color setting: AutoColor
[Fri 26-10-2018 19:56:09]:  print header: True
[Fri 26-10-2018 19:56:09]:  subsample: False
[Fri 26-10-2018 19:56:09]:  verbosity: Normal
[Fri 26-10-2018 19:56:09]:  search path:
[Fri 26-10-2018 19:56:09]: Loading modules...
[Fri 26-10-2018 19:56:09]: Validating script...
[Fri 26-10-2018 19:56:09]: Transforming script...
[Fri 26-10-2018 19:56:09]: Transformation for QC triggered for variable Variable "input" on line 7.
NGLess v0.9.1 (C) NGLess authors
http://ngless.embl.de/

When publishing results from this script, please cite the following references:

     - Coelho, L.P., Alves, R., Monteiro, P., Huerta-Cepas, J., Freitas, A.T., and Bork, P., 2018.
     NG-meta-profiler: fast processing of metagenomes using NGLess, a omain-specific language bioRxiv
     367755 https://doi.org/10.1101/367755

     - Li, H., 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv
     preprint arXiv:1303.3997.

     - MOCAT2: a metagenomic assembly, annotation and profiling framework. Kultima JR, Coelho LP, Forslund
     K, Huerta-Cepas J, Li S, Driessen M, et al. (2016) Bioinformatics (2016)
     doi:10.1093/bioinformatics/btw183

     - MOCAT: A Metagenomics Assembly and Gene Prediction Toolkit. Kultima JR, Sunagawa S, Li J, Chen W, Chen
     H, Mende DR, et al. (2012) PLoS ONE 7(10): e47656. doi:10.1371/journal.pone.0047656

[Fri 26-10-2018 19:56:09]: Script OK. Starting interpretation...
[Fri 26-10-2018 19:56:09]: Interpreting [interpretIO]: __check_index_access(Lookup 'ARGV' (type unknown); original_lno=5; index1=2)
[Fri 26-10-2018 19:56:09]: Interpreting [executing module function: '__check_index_access']: NGOList [NGOString "/scratch/milanese/TEST/QC/filter-human-merged.ngl",NGOString "/g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156",NGOString "/scratch/milanese/TEST/QC/list2"]
[Fri 26-10-2018 19:56:09]: Interpreting [interpretIO]: __check_index_access(Lookup 'ARGV' (type unknown); original_lno=7; index1=1)
[Fri 26-10-2018 19:56:09]: Interpreting [executing module function: '__check_index_access']: NGOList [NGOString "/scratch/milanese/TEST/QC/filter-human-merged.ngl",NGOString "/g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156",NGOString "/scratch/milanese/TEST/QC/list2"]
[Fri 26-10-2018 19:56:09] Line 5: Interpreting [interpretIO]: samples = readlines(Lookup 'ARGV' as NGList NGLString[IndexOne 2])
[Fri 26-10-2018 19:56:09] Line 5: Interpreting [assignment]: readlines(Lookup 'ARGV' as NGList NGLString[IndexOne 2])
[Fri 26-10-2018 19:56:09] Line 5: Interpreting [executing module function: 'readlines']: NGOString "/scratch/milanese/TEST/QC/list2"
[Fri 26-10-2018 19:56:09] Line 6: Interpreting [interpretIO]: sample = lock1(Lookup 'samples' as NGList NGLString; __hash="8cfe7bf84bb3c439150acb509ecccbb7")
[Fri 26-10-2018 19:56:09] Line 6: Interpreting [assignment]: lock1(Lookup 'samples' as NGList NGLString; __hash="8cfe7bf84bb3c439150acb509ecccbb7")
[Fri 26-10-2018 19:56:09] Line 6: Interpreting [executing module function: 'lock1']: NGOList [NGOString "CCIS12370844ST-4-0",NGOString ""]
[Fri 26-10-2018 19:56:09] Line 6: Looking for a lock in ngless-locks/8cfe7bf8. Total number of elements is 2 (not locked: 2; not finished: 2).
[Fri 26-10-2018 19:56:09] Line 6: Acquired lock file ngless-locks/8cfe7bf8/CCIS12370844ST-4-0.lock
[Fri 26-10-2018 19:56:09] Line 6: lock1: Obtained lock file: 'ngless-locks/8cfe7bf8/CCIS12370844ST-4-0.lock'
[Fri 26-10-2018 19:56:09] Line 6: Writing stats to 'ngless-stats/8cfe7bf8/CCIS12370844ST-4-0'
[Fri 26-10-2018 19:56:09] Line 7: Interpreting [interpretIO]: input = load_mocat_sample(Lookup 'ARGV' as NGList NGLString[IndexOne 1]BOpAdd"/"BOpAddLookup 'sample' as NGLString; __perform_qc=False)
[Fri 26-10-2018 19:56:09] Line 7: Interpreting [assignment]: load_mocat_sample(Lookup 'ARGV' as NGList NGLString[IndexOne 1]BOpAdd"/"BOpAddLookup 'sample' as NGLString; __perform_qc=False)
[Fri 26-10-2018 19:56:09] Line 7: Interpreting [executing module function: 'load_mocat_sample']: NGOString "/g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0"
[Fri 26-10-2018 19:56:09] Line 7: Executing load_mocat_sample transform
[Fri 26-10-2018 19:56:09] Line 7: load_mocat_sample found paired-end sample '/g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-1_lane5.screened.adapter.screened.hg19.pair.1.fq.gz' - '/g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-1_lane5.screened.adapter.screened.hg19.pair.2.fq.gz'
[Fri 26-10-2018 19:56:09] Line 7: load_mocat_sample found paired-end sample '/g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-1_lane6.screened.adapter.screened.hg19.pair.1.fq.gz' - '/g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-1_lane6.screened.adapter.screened.hg19.pair.2.fq.gz'
[Fri 26-10-2018 19:56:09] Line 7: load_mocat_sample found paired-end sample '/g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-2_lane7.screened.adapter.screened.hg19.pair.1.fq.gz' - '/g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-2_lane7.screened.adapter.screened.hg19.pair.2.fq.gz'
[Fri 26-10-2018 19:56:09] Line 7: load_mocat_sample found paired-end sample '/g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-2_lane8.screened.adapter.screened.hg19.pair.1.fq.gz' - '/g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-2_lane8.screened.adapter.screened.hg19.pair.2.fq.gz'
[Fri 26-10-2018 19:56:09] Line 7: load_mocat_sample found paired-end sample '/g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-3_lane8.screened.adapter.screened.hg19.pair.1.fq.gz' - '/g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-3_lane8.screened.adapter.screened.hg19.pair.2.fq.gz'
[Fri 26-10-2018 19:56:09] Line 7: load_mocat_sample found single-end sample '/g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-1_lane5.screened.adapter.screened.hg19.singles.fq.gz'
[Fri 26-10-2018 19:56:09] Line 7: load_mocat_sample found single-end sample '/g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-1_lane6.screened.adapter.screened.hg19.singles.fq.gz'
[Fri 26-10-2018 19:56:09] Line 7: load_mocat_sample found single-end sample '/g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-2_lane7.screened.adapter.screened.hg19.singles.fq.gz'
[Fri 26-10-2018 19:56:09] Line 7: load_mocat_sample found single-end sample '/g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-2_lane8.screened.adapter.screened.hg19.singles.fq.gz'
[Fri 26-10-2018 19:56:09] Line 7: load_mocat_sample found single-end sample '/g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-3_lane8.screened.adapter.screened.hg19.singles.fq.gz'
[Fri 26-10-2018 19:56:09] Line 7: Executing paired on "/g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-1_lane5.screened.adapter.screened.hg19.pair.1.fq.gz""/g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-1_lane5.screened.adapter.screened.hg19.pair.2.fq.gz"""
[Fri 26-10-2018 19:56:09] Line 7: Executing paired on "/g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-1_lane6.screened.adapter.screened.hg19.pair.1.fq.gz""/g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-1_lane6.screened.adapter.screened.hg19.pair.2.fq.gz"""
[Fri 26-10-2018 19:56:09] Line 7: Executing paired on "/g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-2_lane7.screened.adapter.screened.hg19.pair.1.fq.gz""/g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-2_lane7.screened.adapter.screened.hg19.pair.2.fq.gz"""
[Fri 26-10-2018 19:56:09] Line 7: Executing paired on "/g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-2_lane8.screened.adapter.screened.hg19.pair.1.fq.gz""/g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-2_lane8.screened.adapter.screened.hg19.pair.2.fq.gz"""
[Fri 26-10-2018 19:56:09] Line 7: Executing paired on "/g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-3_lane8.screened.adapter.screened.hg19.pair.1.fq.gz""/g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-3_lane8.screened.adapter.screened.hg19.pair.2.fq.gz"""
[Fri 26-10-2018 19:56:09] Line 9: Interpreting [interpretIO]: input = preprocess(Lookup 'input' as NGLReadSet; __input_qc=True; keep_singles=True)using {Block {blockVariable = [Variable "read"], blockBody = Sequence [Optimized (SubstrimReassign (Variable "read") 25),Optimized (LenThresholdDiscard (Variable "read") BOpLT 45)]}}
[Fri 26-10-2018 19:56:09] Line 9: Interpreting [assignment]: preprocess(Lookup 'input' as NGLReadSet; __input_qc=True; keep_singles=True)using {Block {blockVariable = [Variable "read"], blockBody = Sequence [Optimized (SubstrimReassign (Variable "read") 25),Optimized (LenThresholdDiscard (Variable "read") BOpLT 45)]}}
[Fri 26-10-2018 19:56:09] Line 9: Created & opened temporary file /scratch/milanese/NGless_temp_dir/preprocessed.1...fq13665-2.gz
[Fri 26-10-2018 19:56:09] Line 9: Created & opened temporary file /scratch/milanese/NGless_temp_dir/preprocessed.2...fq13665-3.gz
[Fri 26-10-2018 19:56:09] Line 9: Created & opened temporary file /scratch/milanese/NGless_temp_dir/preprocessed.singles...fq13665-4.gz
[Fri 26-10-2018 19:56:13] Line 9: Simple Statistics completed for: /g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-1_lane5.screened.adapter.screened.hg19.pair.1.fq.gz
[Fri 26-10-2018 19:56:13] Line 9: Number of base pairs: 95
[Fri 26-10-2018 19:56:13] Line 9: Encoding is: SangerEncoding
[Fri 26-10-2018 19:56:13] Line 9: Number of sequences: 629462
[Fri 26-10-2018 19:56:13] Line 9: Simple Statistics completed for: /g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-1_lane5.screened.adapter.screened.hg19.pair.2.fq.gz
[Fri 26-10-2018 19:56:13] Line 9: Number of base pairs: 95
[Fri 26-10-2018 19:56:13] Line 9: Encoding is: SangerEncoding
[Fri 26-10-2018 19:56:13] Line 9: Number of sequences: 629462
[Fri 26-10-2018 19:56:16] Line 9: Simple Statistics completed for: /g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-1_lane6.screened.adapter.screened.hg19.pair.1.fq.gz
[Fri 26-10-2018 19:56:16] Line 9: Number of base pairs: 95
[Fri 26-10-2018 19:56:16] Line 9: Encoding is: SangerEncoding
[Fri 26-10-2018 19:56:16] Line 9: Number of sequences: 569180
[Fri 26-10-2018 19:56:16] Line 9: Simple Statistics completed for: /g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-1_lane6.screened.adapter.screened.hg19.pair.2.fq.gz
[Fri 26-10-2018 19:56:16] Line 9: Number of base pairs: 95
[Fri 26-10-2018 19:56:16] Line 9: Encoding is: SangerEncoding
[Fri 26-10-2018 19:56:16] Line 9: Number of sequences: 569180
[Fri 26-10-2018 19:56:19] Line 9: Simple Statistics completed for: /g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-2_lane7.screened.adapter.screened.hg19.pair.1.fq.gz
[Fri 26-10-2018 19:56:19] Line 9: Number of base pairs: 94
[Fri 26-10-2018 19:56:19] Line 9: Encoding is: SangerEncoding
[Fri 26-10-2018 19:56:19] Line 9: Number of sequences: 365987
[Fri 26-10-2018 19:56:19] Line 9: Simple Statistics completed for: /g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-2_lane7.screened.adapter.screened.hg19.pair.2.fq.gz
[Fri 26-10-2018 19:56:19] Line 9: Number of base pairs: 94
[Fri 26-10-2018 19:56:19] Line 9: Encoding is: SangerEncoding
[Fri 26-10-2018 19:56:19] Line 9: Number of sequences: 365987
[Fri 26-10-2018 19:56:21] Line 9: Simple Statistics completed for: /g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-2_lane8.screened.adapter.screened.hg19.pair.1.fq.gz
[Fri 26-10-2018 19:56:21] Line 9: Number of base pairs: 94
[Fri 26-10-2018 19:56:21] Line 9: Encoding is: SangerEncoding
[Fri 26-10-2018 19:56:21] Line 9: Number of sequences: 353423
[Fri 26-10-2018 19:56:21] Line 9: Simple Statistics completed for: /g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-2_lane8.screened.adapter.screened.hg19.pair.2.fq.gz
[Fri 26-10-2018 19:56:21] Line 9: Number of base pairs: 94
[Fri 26-10-2018 19:56:21] Line 9: Encoding is: SangerEncoding
[Fri 26-10-2018 19:56:21] Line 9: Number of sequences: 353423
[Fri 26-10-2018 19:56:21] Line 9: Simple Statistics completed for: /g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-3_lane8.screened.adapter.screened.hg19.pair.1.fq.gz
[Fri 26-10-2018 19:56:21] Line 9: Number of base pairs: 94
[Fri 26-10-2018 19:56:21] Line 9: Encoding is: SangerEncoding
[Fri 26-10-2018 19:56:21] Line 9: Number of sequences: 63941
[Fri 26-10-2018 19:56:21] Line 9: Simple Statistics completed for: /g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-3_lane8.screened.adapter.screened.hg19.pair.2.fq.gz
[Fri 26-10-2018 19:56:21] Line 9: Number of base pairs: 94
[Fri 26-10-2018 19:56:21] Line 9: Encoding is: SangerEncoding
[Fri 26-10-2018 19:56:21] Line 9: Number of sequences: 63941
[Fri 26-10-2018 19:56:22] Line 9: Simple Statistics completed for: /g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-1_lane5.screened.adapter.screened.hg19.singles.fq.gz
[Fri 26-10-2018 19:56:22] Line 9: Number of base pairs: 95
[Fri 26-10-2018 19:56:22] Line 9: Encoding is: SangerEncoding
[Fri 26-10-2018 19:56:22] Line 9: Number of sequences: 145122
[Fri 26-10-2018 19:56:22] Line 9: Simple Statistics completed for: /g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-1_lane6.screened.adapter.screened.hg19.singles.fq.gz
[Fri 26-10-2018 19:56:22] Line 9: Number of base pairs: 95
[Fri 26-10-2018 19:56:22] Line 9: Encoding is: SangerEncoding
[Fri 26-10-2018 19:56:22] Line 9: Number of sequences: 139763
[Fri 26-10-2018 19:56:23] Line 9: Simple Statistics completed for: /g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-2_lane7.screened.adapter.screened.hg19.singles.fq.gz
[Fri 26-10-2018 19:56:23] Line 9: Number of base pairs: 94
[Fri 26-10-2018 19:56:23] Line 9: Encoding is: SangerEncoding
[Fri 26-10-2018 19:56:23] Line 9: Number of sequences: 270559
[Fri 26-10-2018 19:56:24] Line 9: Simple Statistics completed for: /g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-2_lane8.screened.adapter.screened.hg19.singles.fq.gz
[Fri 26-10-2018 19:56:24] Line 9: Number of base pairs: 94
[Fri 26-10-2018 19:56:24] Line 9: Encoding is: SangerEncoding
[Fri 26-10-2018 19:56:24] Line 9: Number of sequences: 298934
[Fri 26-10-2018 19:56:28] Line 9: Simple Statistics completed for: /g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0/CCIS12370844ST-4-0_11s002754-1-3_lane8.screened.adapter.screened.hg19.singles.fq.gz
[Fri 26-10-2018 19:56:28] Line 9: Number of base pairs: 94
[Fri 26-10-2018 19:56:28] Line 9: Encoding is: SangerEncoding
[Fri 26-10-2018 19:56:28] Line 9: Number of sequences: 965214
[Fri 26-10-2018 19:56:28] Line 9: Preprocess finished
[Fri 26-10-2018 19:56:28] Line 9: Simple Statistics completed for: preproc.lno9.pairs.1
[Fri 26-10-2018 19:56:28] Line 9: Number of base pairs: 95
[Fri 26-10-2018 19:56:28] Line 9: Encoding is: SangerEncoding
[Fri 26-10-2018 19:56:28] Line 9: Number of sequences: 1797171
[Fri 26-10-2018 19:56:28] Line 9: Simple Statistics completed for: preproc.lno9.pairs.2
[Fri 26-10-2018 19:56:28] Line 9: Number of base pairs: 95
[Fri 26-10-2018 19:56:28] Line 9: Encoding is: SangerEncoding
[Fri 26-10-2018 19:56:28] Line 9: Number of sequences: 1797171
[Fri 26-10-2018 19:56:28] Line 9: Simple Statistics completed for: preproc.lno9.singles
[Fri 26-10-2018 19:56:28] Line 9: Number of base pairs: 95
[Fri 26-10-2018 19:56:28] Line 9: Encoding is: SangerEncoding
[Fri 26-10-2018 19:56:28] Line 9: Number of sequences: 1733208
[Fri 26-10-2018 19:56:28] Line 13: Interpreting [interpretIO]: mapped = map(Lookup 'input' as NGLReadSet; reference="hg19")
[Fri 26-10-2018 19:56:28] Line 13: Interpreting [assignment]: map(Lookup 'input' as NGLReadSet; reference="hg19")
[Fri 26-10-2018 19:56:28] Line 13: Looked for hg19 in directory /home/milanese/.local/share/ngless/data/References/hg19 (and did not find it)
[Fri 26-10-2018 19:56:28] Line 13: Looked for hg19 in directory /g/scb2/zeller/milanese/software/ANACONDA/install/bin/../share/ngless/data/References/hg19 (and found it)
[Fri 26-10-2018 19:56:28] Line 13: Index for /g/scb2/zeller/milanese/software/ANACONDA/install/bin/../share/ngless/data/References/hg19/Sequence/BWAIndex/reference.fa.gz already exists.
[Fri 26-10-2018 19:56:28] Line 13: Created & opened temporary file /scratch/milanese/NGless_temp_dir/mapped_reference.13665-5.sam
[Fri 26-10-2018 19:56:28] Line 13: Starting mapping to /g/scb2/zeller/milanese/software/ANACONDA/install/bin/../share/ngless/data/References/hg19/Sequence/BWAIndex/reference.fa.gz
[Fri 26-10-2018 19:56:28] Line 13: Calling: /g/scb2/zeller/milanese/software/ANACONDA/install/bin/../share/ngless/bin/ngless-0.9.1-bwa mem -t 8 /g/scb2/zeller/milanese/software/ANACONDA/install/bin/../share/ngless/data/References/hg19/Sequence/BWAIndex/reference.fa.gz /scratch/milanese/NGless_temp_dir/preprocessed.1...fq13665-2.gz /scratch/milanese/NGless_temp_dir/preprocessed.2...fq13665-3.gz
[Fri 26-10-2018 20:04:15] Line 13: BWA info: [M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 892496 sequences (80000139 bp)...
[M::process] read 891218 sequences (80000171 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (522, 76195, 560, 526)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (1961, 2985, 3987)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 8039)
[M::mem_pestat] mean and std.dev: (2742.86, 1187.82)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 10065)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (68, 71, 75)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (54, 89)
[M::mem_pestat] mean and std.dev: (71.31, 5.06)
[M::mem_pestat] low and high boundaries for proper pairs: (47, 96)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (2, 26, 709)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 2123)
[M::mem_pestat] mean and std.dev: (423.81, 614.14)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 2880)
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (2068, 3018, 3987)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 7825)
[M::mem_pestat] mean and std.dev: (2867.86, 1092.02)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 9744)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RF
[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 892496 reads in 911.429 CPU sec, 116.107 real sec
[M::process] read 923126 sequences (80000054 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (519, 77173, 543, 523)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (1914, 2649, 3978)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 8106)
[M::mem_pestat] mean and std.dev: (2625.26, 1303.36)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 10170)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (68, 71, 75)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (54, 89)
[M::mem_pestat] mean and std.dev: (71.30, 5.06)
[M::mem_pestat] low and high boundaries for proper pairs: (47, 96)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (2, 34, 587)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 1757)
[M::mem_pestat] mean and std.dev: (367.36, 546.33)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 2553)
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (2067, 3035, 3991)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 7839)
[M::mem_pestat] mean and std.dev: (2893.28, 1093.52)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 9763)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RF
[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 891218 reads in 909.287 CPU sec, 114.563 real sec
[M::process] read 887502 sequences (72012857 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (477, 90794, 739, 445)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (1892, 2646, 3973)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 8135)
[M::mem_pestat] mean and std.dev: (2577.37, 1311.18)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 10216)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (67, 70, 74)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (53, 88)
[M::mem_pestat] mean and std.dev: (70.42, 5.44)
[M::mem_pestat] low and high boundaries for proper pairs: (46, 95)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (2, 10, 442)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 1322)
[M::mem_pestat] mean and std.dev: (137.12, 290.42)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 1762)
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (2086, 2915, 3982)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 7774)
[M::mem_pestat] mean and std.dev: (2853.81, 1102.72)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 9670)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RF
[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 923126 reads in 903.050 CPU sec, 113.579 real sec
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (492, 82359, 982, 577)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (1924, 2694, 3818)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 7606)
[M::mem_pestat] mean and std.dev: (2653.04, 1192.80)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 9500)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (53, 69, 73)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (13, 113)
[M::mem_pestat] mean and std.dev: (62.02, 16.00)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 133)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (3, 11, 327)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 975)
[M::mem_pestat] mean and std.dev: (102.18, 214.13)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 1299)
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (2092, 3027, 3976)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 7744)
[M::mem_pestat] mean and std.dev: (2865.38, 1089.69)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 9628)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RF
[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 887502 reads in 783.114 CPU sec, 98.607 real sec
[main] Version: 0.7.17-r1188
[main] CMD: /g/scb2/zeller/milanese/software/ANACONDA/install/bin/../share/ngless/bin/ngless-0.9.1-bwa mem -t 8 /g/scb2/zeller/milanese/software/ANACONDA/install/bin/../share/ngless/data/References/hg19/Sequence/BWAIndex/reference.fa.gz /scratch/milanese/NGless_temp_dir/preprocessed.1...fq13665-2.gz /scratch/milanese/NGless_temp_dir/preprocessed.2...fq13665-3.gz
[main] Real time: 466.797 sec; CPU: 3512.062 sec

[Fri 26-10-2018 20:04:15] Line 13: Finished mapping to /g/scb2/zeller/milanese/software/ANACONDA/install/bin/../share/ngless/data/References/hg19/Sequence/BWAIndex/reference.fa.gz
[Fri 26-10-2018 20:04:15] Line 13: Starting mapping to /g/scb2/zeller/milanese/software/ANACONDA/install/bin/../share/ngless/data/References/hg19/Sequence/BWAIndex/reference.fa.gz
[Fri 26-10-2018 20:04:15] Line 13: Calling: /g/scb2/zeller/milanese/software/ANACONDA/install/bin/../share/ngless/bin/ngless-0.9.1-bwa mem -t 8 /g/scb2/zeller/milanese/software/ANACONDA/install/bin/../share/ngless/data/References/hg19/Sequence/BWAIndex/reference.fa.gz /scratch/milanese/NGless_temp_dir/preprocessed.singles...fq13665-4.gz
[Fri 26-10-2018 20:06:32] Line 13: BWA info: [M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 1040202 sequences (80000176 bp)...
[M::process] read 693006 sequences (57701577 bp)...
[M::mem_process_seqs] Processed 1040202 reads in 554.027 CPU sec, 70.580 real sec
[M::mem_process_seqs] Processed 693006 reads in 463.502 CPU sec, 59.181 real sec
[main] Version: 0.7.17-r1188
[main] CMD: /g/scb2/zeller/milanese/software/ANACONDA/install/bin/../share/ngless/bin/ngless-0.9.1-bwa mem -t 8 /g/scb2/zeller/milanese/software/ANACONDA/install/bin/../share/ngless/data/References/hg19/Sequence/BWAIndex/reference.fa.gz /scratch/milanese/NGless_temp_dir/preprocessed.singles...fq13665-4.gz
[main] Real time: 137.083 sec; CPU: 1022.007 sec

[Fri 26-10-2018 20:06:32] Line 13: Finished mapping to /g/scb2/zeller/milanese/software/ANACONDA/install/bin/../share/ngless/data/References/hg19/Sequence/BWAIndex/reference.fa.gz
[Fri 26-10-2018 20:06:32] Line 13: Finished mapping to /g/scb2/zeller/milanese/software/ANACONDA/install/bin/../share/ngless/data/References/hg19/Sequence/BWAIndex/reference.fa.gz
[Fri 26-10-2018 20:06:32] Line 13: Total reads: 3530379
[Fri 26-10-2018 20:06:32] Line 13: Total reads aligned: 2173241 [61.56%]
[Fri 26-10-2018 20:06:32] Line 13: Total reads Unique map: 1795123 [50.85%]
[Fri 26-10-2018 20:06:32] Line 13: Total reads Non-Unique map: 378118 [10.71%]
[Fri 26-10-2018 20:06:32] Line 15: Interpreting [interpretIO]: mapped = select(Lookup 'mapped' as NGLMappedReadSet)using {Block {blockVariable = [Variable "mr"], blockBody = Sequence [mr = (Lookup 'mr' as NGLMappedRead).MethodName {unwrapMethodName = "filter"}( Nothing; min_match_size=45; min_identity_pc=90; action={unmatch} ),if [(Lookup 'mr' as NGLMappedRead).MethodName {unwrapMethodName = "flag"}( Just {mapped} )] then {Sequence [discard]} else {Sequence []}]}}
[Fri 26-10-2018 20:06:32] Line 15: Interpreting [assignment]: select(Lookup 'mapped' as NGLMappedReadSet)using {Block {blockVariable = [Variable "mr"], blockBody = Sequence [mr = (Lookup 'mr' as NGLMappedRead).MethodName {unwrapMethodName = "filter"}( Nothing; min_match_size=45; min_identity_pc=90; action={unmatch} ),if [(Lookup 'mr' as NGLMappedRead).MethodName {unwrapMethodName = "flag"}( Just {mapped} )] then {Sequence [discard]} else {Sequence []}]}}
[Fri 26-10-2018 20:06:32] Line 15: Executing blocked select on file /scratch/milanese/NGless_temp_dir/mapped_reference.13665-5.sam
[Fri 26-10-2018 20:06:32] Line 15: Created & opened temporary file /scratch/milanese/NGless_temp_dir/block_selected_mapped_reference13665-6.sam
[Fri 26-10-2018 20:06:42] Line 20: Interpreting [interpretIO]: temp$4 = as_reads(Lookup 'mapped' as NGLMappedReadSet)
[Fri 26-10-2018 20:06:42] Line 20: Interpreting [assignment]: as_reads(Lookup 'mapped' as NGLMappedReadSet)
[Fri 26-10-2018 20:06:42] Line 20: Interpreting [executing module function: 'as_reads']: NGOMappedReadSet {nglgroupName = "/g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0", nglSamFile = File /scratch/milanese/NGless_temp_dir/block_selected_mapped_reference13665-6.sam, nglReference = Just "hg19"}
[Fri 26-10-2018 20:06:42] Line 20: Created & opened temporary file /scratch/milanese/NGless_temp_dir/reads_block_selected_mapped_reference13665-6..1.fq13665-7.gz
[Fri 26-10-2018 20:06:42] Line 20: Created & opened temporary file /scratch/milanese/NGless_temp_dir/reads_block_selected_mapped_reference13665-6..2.fq13665-8.gz
[Fri 26-10-2018 20:06:42] Line 20: Created & opened temporary file /scratch/milanese/NGless_temp_dir/reads_block_selected_mapped_reference13665-6..singles.fq13665-9.gz
[Fri 26-10-2018 20:06:50] Line 20: Finished as_reads
[Fri 26-10-2018 20:06:51] Line 20: Heuristic for FastQ encoding determination for file "/scratch/milanese/NGless_temp_dir/reads_block_selected_mapped_reference13665-6..1.fq13665-7.gz" cannot be 100% confident. Guessing 33 offset (Sanger encoding, used by newer Illumina machines).
[Fri 26-10-2018 20:06:51] Line 20: Interpreting [interpretIO]: write(Lookup 'temp$4' as NGLReadSet; __can_move=True; __hash="5ebc69331546eebb8a8b9049e8e4e616"; ofile=Lookup 'sample' as NGLStringBOpAdd"/"BOpAddLookup 'sample' as NGLStringBOpAdd".filtered.fq.gz")
[Fri 26-10-2018 20:06:51] Line 20: Interpreting [write]: NGOReadSet "/g/scb2/zeller/SHARED/DATA/metaG/REAL/CRC-META/FR-CRC_N156/CCIS12370844ST-4-0" (ReadSet {pairedSamples = [(FastQFilePath {fqpathEncoding = SangerEncoding, fqpathFilePath = "/scratch/milanese/NGless_temp_dir/reads_block_selected_mapped_reference13665-6..1.fq13665-7.gz"},FastQFilePath {fqpathEncoding = SangerEncoding, fqpathFilePath = "/scratch/milanese/NGless_temp_dir/reads_block_selected_mapped_reference13665-6..2.fq13665-8.gz"})], singleSamples = [FastQFilePath {fqpathEncoding = SangerEncoding, fqpathFilePath = "/scratch/milanese/NGless_temp_dir/reads_block_selected_mapped_reference13665-6..singles.fq13665-9.gz"}]})
Exiting after internal error. If you can reproduce this issue, please run your script with the --trace flag and report a bug at http://github.com/luispedro/ngless/issues
/scratch/milanese/NGless_temp_dir/reads_block_selected_mapped_reference13665-6..1.fq13665-7.gz: renameFile:renamePath:rename: does not exist (No such file or directory)
luispedro commented 6 years ago

Thanks for the excellent report, @AlessioMilanese

luispedro commented 6 years ago

I managed to reproduce it locally with v0.9.1, but it works correctly on master. Still looking into it, but might just be worthwhile to do a new release.

luispedro commented 6 years ago

Could this possibly be a permission error with an erroneous error message? Does your user have permissions to create the output files inside the output directory?

AlessioMilanese commented 6 years ago

I have writing permission. I can reproduce the error also with just one lane:

CCIS12370844ST-4-0_11s002754-1-1_lane5.screened.adapter.screened.hg19.pair.1.fq.gz
CCIS12370844ST-4-0_11s002754-1-1_lane5.screened.adapter.screened.hg19.pair.2.fq.gz

and also if I rename the samples to:

a_1.fastq.gz
a_2.fastq.gz

The error says

/scratch/milanese/NGless_temp_dir/reads_block_selected_mapped_reference7827-4..1.fq7827-5.gz: renameFile:renamePath:rename: does not exist (No such file or directory)

but when I check reads_block_selected_mapped_reference7827-4..1.fq7827-5.gz before the script finish, I can see it:

-bash-4.2$ ls -lah /scratch/milanese/NGless_temp_dir/
total 648M
drwxr-xr-x.  2 milanese zeller    8 Oct 28 22:37 .
drwxr-xr-x. 16 milanese zeller   18 Oct 26 14:14 ..
-rw-r--r--.  1 milanese zeller 142M Oct 28 22:37 block_selected_mapped_reference7827-4.sam
-rw-r--r--.  1 milanese zeller 349M Oct 28 22:37 mapped_reference.7827-3.sam
-rw-r--r--.  1 milanese zeller  46M Oct 28 22:34 preprocessed.1...fq7827-0.gz
-rw-r--r--.  1 milanese zeller  46M Oct 28 22:34 preprocessed.2...fq7827-1.gz
-rw-r--r--.  1 milanese zeller 2.6M Oct 28 22:34 preprocessed.singles...fq7827-2.gz
-rw-r--r--.  1 milanese zeller  22M Oct 28 22:37 reads_block_selected_mapped_reference7827-4..1.fq7827-5.gz
-rw-r--r--.  1 milanese zeller  22M Oct 28 22:37 reads_block_selected_mapped_reference7827-4..2.fq7827-6.gz
-rw-r--r--.  1 milanese zeller    0 Oct 28 22:37 reads_block_selected_mapped_reference7827-4..singles.fq7827-7.gz

It seems indeed a problem when moving the files to the final destination.

AlessioMilanese commented 6 years ago

If I change the name of the directory from CCIS12370844ST-4-0 to aa and the file list to:

aa

it seems to work. Can you reproduce it?

luispedro commented 5 years ago

Okay, I figured this out. It's a mix of (1) what I think is a bug in your code (2) NGLess does not correctly check if the output directory exists and (3) outputs a horrible error message, (4) partially because of another bug in the haskell posix library (https://github.com/haskell/unix/issues/60)

You were doing:

samples = readlines(ARGV[2])
sample = lock1(samples)
input = load_mocat_sample(ARGV[1] + '/' + sample)

You load the data from ARGV[1] </> sample, but later write to another directory:

write(as_reads(mapped), ofile=sample+'/'+sample+'.filtered.fq.gz')

Just sample.

If this directory does not exist, you get the stupid behaviour you saw.

I am re-classifying this bug as "better-error-message"