Closed hputnam closed 4 months ago
Checked that gene count matrix file and gffs file match exactly for gene_ids in DESeq2 code to start off all analyses here, see lines 185-198 for comparison: DESeq2
However, looking back into my functional annotation assembly script, even though the output file has 27,439 unique genes there were still unmatched gene_ids between it and the gff3, so tracking some of the unmatched genes through the script (ex: "Pver_g408", "Pver_g3070", "Pver_novel_701_5de57afd") which show that the functional annotation file that was blasted to the original protein fasta file has naming issues. Currently looking into where the divide is occurring and where the pattern is.
@daniellembecker check the code here for how I dealt with it: https://github.com/hputnam/Poc_RAPID/blob/main/RAnalysis/scripts/Embryo_GeneExpression.Rmd
@hputnam I think your repo is private the link isn't working and I don't see it on you repo list
try again
all now match from edits on line 188-229: code
Your Annotation file should have 27,439 genes that were identified by blasting the protein fasta file, which clearly identifies the protein name and links it directly to the original genome paper. Therefore, you need to ensure that your gene counts matrix file gene ids match it for annotation of DEGs.
Also the gff3 file needs to match gene id exactly to calculate the correct gene lengths and to join the files for GO analysis