hwc2021 / GSAT

Graph-based Sequence Assembly Toolkit
MIT License
20 stars 6 forks source link

An input file problem #3

Open UncleGua opened 1 year ago

UncleGua commented 1 year ago

Hi, in the graphShort session, is the required input file mitochondrial genome sequencing data or whole genome sequencing data?

I see a warnning: Warning:` Please note that the SPAdes is not recommended for large sequencing dataset > 5 GB. You can use a subset instead.

If mitochondrial genome reads data are requested, is there any way I can extract mitochondrial reads from whole genome sequencing data?

chlorophyllb commented 1 year ago

1.you can use getorganelle with -s paramater set your reference as seed sequence, then getorganelle will use SPAdes in their programme to assemble mitochondrial genome with extended_1_paired.fq/extended_2_paired.fq extract from your raw sequencing data. 2.or use bwa/samtools to extract mitochondrial reads

hwc2021 commented 1 year ago

Hi!

Either the whole genome sequencing (WGS) dataset or mitochondrial genome sequencing data is ok for the GSAT. So, you can just extract an subset of your sequencing dataset (e.g., 10% of your reads) by using seqkit software.

Or, if you have suitable and enough reference mitogenomes, you can also choose to extract mitochondrial reads as suggested by chlorophyllb. If not, an WGS dataset is preferable.