ndreey / ghost-magnet

Molecular Bioinformatics BSc thesis project at University of Skövde
MIT License
1 stars 0 forks source link

Create in silico NGS data of Platanthera spp. #4

Closed ndreey closed 1 year ago

ndreey commented 1 year ago

Create in silico NGS data of Platanthera spp. using the reference genome. Either ART or NGSNGS will be used

ndreey commented 1 year ago

Because the scp command did not work when trying to get the P_ziji file to local I will use the A. thaliana reference genome we just removed as practice data, GCF_000001735.4_TAIR10.1_genomic.fna.gz. Because NGSNGS is supposedly faster and created by bioinformaticians at KU I will use this simulation program instead of ART

EDIT The scp did not work as i was running the command from WSL, when i ran the same code in pwsh it worked. I assume my WSL is not set up with a ssh-agent and is thus the reason for not working.

ndreey commented 1 year ago

Determine how to set seed and have NGSNGS generate reads of P_zinji three times so we don't dilute with same reads across all samples (maybe four or five times??).

ndreey commented 1 year ago

The practice reference had seven 5 chromosomes, one plastid, and one chloroform genome sequence. I split the file to only hold the chloroform sequence. After many tries, I am able to create both SE and PE fastq reads.

Single End

The question lies now in understanding the quality profile file that is required.

ndreey commented 1 year ago

Update:

What dawned on me, though, is that if I were to dilute a 4.5GB mock data file to reach 90% P. zijinensis the file would become ~45GB??. Not good..

ndreey commented 1 year ago

Closing as i will run CAMISIM instead.