sergiolitwiniuk85 / nf-lula

0 stars 0 forks source link

proof-of-concept #1

Open sergiolitwiniuk85 opened 7 months ago

sergiolitwiniuk85 commented 7 months ago

-Download 3 reference genomes, #37, 38 and t2t

-Split into 300pb chunks each of the genomes #check chunk_size

-Align each one to each reference using fastq_screen and find unique matching to genome refs,

-Validate mapping uniques markers against bam whose reference are known (checker,script)

sergiolitwiniuk85 commented 7 months ago

FASTA file to a FASTQ file with a constant quality score using seqtk:

''' seqtk seq -A your_input.fasta | awk '{print "@"NR"\n"$0"\n+\n"substr("I",1,length($0)) '''

sergiolitwiniuk85 commented 7 months ago

To build the index of genomes preparing for fastq_screen

nohup sh -c 'bowtie2-build --threads 32 38/GCF_000001405.40_GRCh38.p14_genomic.fna 38/GCF_000001405.40_GRCh38.p14_genomic && bowtie2-build --threads 32 t2t/GCF_009914755.1_T2T-CHM13v2.0_genomic.fna t2t/GCF_009914755.1_T2T-CHM13v2.0_genomic' &

sergiolitwiniuk85 commented 7 months ago

Running fastq_screen:

''' nohup sh -c "fastq_screen --conf fastq_screen.conf --tag --filter '00' --outdir out_38 splitted_fastq/37_genomic.splitted.fastq' '''