head Blastquery-GOslim.tab
NC_035786.1:35005823-35053811 GO:0000002 cell organization and biogenesis P
NC_035784.1:81157412-81161455 GO:0000002 cell organization and biogenesis P
NC_035785.1:6920706-6928297 GO:0000002 cell organization and biogenesis P
NC_035788.1:25841487-25842908 GO:0000002 cell organization and biogenesis P
NC_035788.1:84032938-84038358 GO:0000002 cell organization and biogenesis P
NC_035788.1:84034120-84038279 GO:0000002 cell organization and biogenesis P
NC_035788.1:65057880-65082026 GO:0000002 cell organization and biogenesis P
NC_035781.1:55490481-55502256 GO:0000002 cell organization and biogenesis P
NC_035782.1:1147125-1157055 GO:0000002 cell organization and biogenesis P
NC_035784.1:87478898-87496133 GO:0000002 cell organization and biogenesis P
Resource Development: Consensus Gene Sequence for 91 samples
Using Combined.SNP.TRSdp5g95FnDNAmaf05.vcf.gz (31GB) link separate VCF files were derived for each library. Details
Full genome sequences were generated using individual VCF files from each library
Methods
Resource Development: Canonical Genes
In order to get a fasta file of all genes in the oyster genome
This fasta file is available: https://d.pr/f/nfzK36 (400MB)
This gene level (genomic) fasta file was annotated.
blastout
For our purposes GO Slim information is desired and this was generated by joining blast output with UniProt tables.
The file with GO Slim Information
Resource Development: Consensus Gene Sequence for 91 samples
Using
Combined.SNP.TRSdp5g95FnDNAmaf05.vcf.gz
(31GB) link separate VCF files were derived for each library. DetailsFull genome sequences were generated using individual VCF files from each library
Then grabbed gene level fasta files for all samples
These 91 fasta files are available here. Both full genome
{}.fa
and gene{}_GENE.fa
(jupyter notebook)CpG Observed / Expected Ratio Calculations
This was determined for all genes for all 91 samples. With a single file with all data was created.