institut-de-genomique / NaS

NaS is a hybrid approach developped to take advantage of data generated using MinION device. We combine Illumina and Oxford Nanopore technologies to produce NaS (Nanopore Synthetic-long) reads of up to 60 kb that aligned with no error to the reference genome and spanned repetitive regions.
http://www.genoscope.cns.fr/nas/
15 stars 2 forks source link

source code? #10

Closed dougwyu closed 6 years ago

dougwyu commented 6 years ago

Hi! I would like to see the source code. Is that possible? In my situation, I really only need to see your code for aligning illumina reads to minION reads (using BLAT, if i've understood correctly). I'm trying use unassembled Illumina reads, individually sequenced from different species, to identify minION reads from mixed-species samples. The idea is to see which set of Illumina reads (each set = 1 species) maps "best" to each minION reads. "Best" is still to be defined, but presumably will be some combination of high coverage and low standard deviation of mapped Illumina reads across each minION read. Of course, some (large) proportion of minION reads won't be ID'd properly, but that should be acceptable.

bistace commented 6 years ago

Hi, this is a public github repository so anyone can access the source code. To save you some time, here is the command line that we use to align illumina reads to the Nanopore reads in the NaS_wrapped source code file :

cat $OUTPUT_DIR/tmp/ILMN_reads.fa | parallel -j $NB_PROC --cat --pipe --block 10M --recstart ">" "$BLAT -tileSize=$TILE -stepSize=$STEP -noHead $NANO_READS {} $OUTPUT_DIR/tmp/psl/blat-alignment.job{#}.tile$TILE.step$STEP.psl" >$OUTPUT_DIR/tmp/blat-alignment.stderr 

I think that the variable names are self-explanatory but if you need some more help, do not hesitate to ask.

dougwyu commented 6 years ago

Thank you very much @bistace.

My mistake was to download the zip file. All the NaS modules were in binary format.

I have now used Github Desktop to sync, and I can see the source code for most of the modules.

However, I cannot see the source code for extract_reads, which i think must refer to your code for extracting the reads that successfully map to each minION read?

bistace commented 6 years ago

Hi, the extract_reads binary file is a part of the compareads2 tool available here http://colibread.inria.fr/software/compareads.

This piece of code is no longer used in NaS and was used to retrieve the sequence of reads that shared similar k-mers after the execution of compareads.

You can use the following command lines to grab the name of Illumina reads that mapped to each MinION read :

mkdir output_dir
cat your_psl_file.psl | awk -v PFX=output_dir '{ file=PFX"/"$14".psl"; print $0>file; }'

This will create one file per MinION read with the name of Illumina reads that mapped to this particular read.

dougwyu commented 6 years ago

Thank you again @bistace