Closed j23414 closed 1 year ago
Went with a small genome
.
Adding some notes here from a debugging session.
Illumina and PacBio reads have been subsetted into three files which match to Hzea Chr 1, thanks Ben and Amanda.
Param | Files |
---|---|
--illumina_reads | "testpolish_{R1,R2}.fq" |
--pacbio_reads | "test.1.filtered.bam" |
These are passed to all three cases:
Should only require the addition of the assembly file (or Hzea Chr 1) and a mitochondrial file
nextflow run isugifNF/polishCLR -r main \
--primary_assembly "GCF_022581195.2_ilHelZeax1.1_chr1.fa" \. <== HERE
--mitochondrial_assembly "GCF_022581195.2_ilHelZeax1.1_mito.fa" \ <== HERE
--illumina_reads "testpolish_{R1,R2}.fq" \
--pacbio_reads "test.1.filtered.bam" \
Should require Case 1 files, plus the alternate assembly file (chr 1) from Falcon unzip but not polished
nextflow run isugifNF/polishCLR -r main \
--primary_assembly "GCF_022581195.2_ilHelZeax1.1_chr1.fa" \. <== HERE
--alternate_assembly "data/alternate.fasta" \ <== pull from Hzea from 3-unzip folder
--mitochondrial_assembly "GCF_022581195.2_ilHelZeax1.1_mito.fa" \ <== HERE
--illumina_reads "testpolish_{R1,R2}.fq" \
--pacbio_reads "test.1.filtered.bam" \
Should require Case 1 files, plus the alternate assembly file (chr 1) from Falcon unzip polished
nextflow run isugifNF/polishCLR -r main \
--primary_assembly "GCF_022581195.2_ilHelZeax1.1_chr1.fa" \. <== HERE
--alternate_assembly "data/alternate.fasta" \ <== pull from Hzea from 4-polish folder
--mitochondrial_assembly "GCF_022581195.2_ilHelZeax1.1_mito.fa" \ <== HERE
--illumina_reads "testpolish_{R1,R2}.fq" \
--pacbio_reads "test.1.filtered.bam" \
Did we want to consider providing a minimum paternal/maternal trio dataset?
Just saw, thanks!
Smaller test datasets have been added to https://data.nal.usda.gov/dataset/data-polishclr-example-input-genome-assemblies
[ NOTE - Data files added 2022-11-01:
@j23414, can you work these into the continuous integration workflow? I'll add them to the documentation!
On it, thank you!
Just an update that I'm running the CI tests in a separate repo before merging. Want to check for data transfer/runtime limits.
I'll vetoing running the test data in CI, since 6hrs to download & run in github ci would delay testing and merging code.
Nextflow stub test cancelled in 6h 0m 15s
Remaining tasks for this issue include adding test data instructions at the top of
@Astahlke @Sivanandan can we close this issue?
Yep! I think we can.
Either as a small genome, or one of several simulated genome options
Option 1: Near ideal case, no repeated sequences in whole genome ACGT AACCGGTT AAACCCGGGTTT... (avoid short reads mapping to multiple locations, near ideal case)
Option 2: Same as option 1, but introduce random errors
Option 3: Same as option 1, but introduce polyploidy