sjteresi / TE_Density

Python script calculating transposable element density for all genes in a genome. Publication: https://mobilednajournal.biomedcentral.com/articles/10.1186/s13100-022-00264-4
GNU General Public License v3.0
28 stars 4 forks source link

unequal set of chromosomes between datasets #133

Closed SolayMane closed 1 year ago

SolayMane commented 1 year ago

I run the command as fellow :

python process_genome.py $DATA_DIR/Cleaned_Fusarium_oxysporum_f.sp._albedinis_Foa_44.tsv $DATA_DIR/Cleaned_Fusarium_oxysporum_f.sp._albedinis_Foa_44_TE.tsv $GENOME -c $ROOT_DIR/config/production_run_config.ini -n 50 -o ou_TE_density

I got this error :

2023-04-11 13:46:55 inra ReviseAnno[65173] INFO Revised annotation has been saved: /home1/software/TE_Density/ou_TE_density/filtered_input_data/revised_input_data/Revised_Cleaned_Fusarium_oxysporum_f.sp._albedinis_Foa_44_TE.tsv
2023-04-11 13:46:55 inra PreProcessor[65173] 
Number of gene annotations split by chromosome != number of TE
                annotations split by chromosome.
                This error has arisen because you have some
                chromosomes in one annotation that do not exist in the other
                annotation. Sometimes this can happen if you have a small
                scaffold that does not have any gene or TE entries.
                TE Density cannot be calculated if there is an unequal set of
                chromosomes between datasets.
                Please trim your annotations so that they have the same number and
                set of chromosome IDs.
                Unique chromosomes in cleaned gene annotation: ['Chr_1', 'Chr_10', 'Chr_11', 'Chr_12', 'Chr_13', 'Chr_2', 'Chr_4', 'Chr_5', 'Chr_7', 'Chr_8', 'Chr_9', 'Contig1', 'Contig11', 'Contig12', 'Contig15', 'Contig17', 'Contig18', 'Contig19', 'Contig21', 'Contig22', 'Contig23', 'Contig24', 'Contig25', 'Contig26', 'Contig29', 'Contig3', 'Contig31', 'Contig32', 'Contig33', 'Contig34', 'Contig36', 'Contig38', 'Contig4', 'Contig42', 'Contig46', 'Contig49', 'Contig50', 'Contig51', 'Contig52', 'Contig54', 'Contig56', 'Contig59', 'Contig62', 'Contig65', 'Contig67', 'Contig68', 'Contig69', 'Contig7', 'Contig8', 'Contig9']
                Unique chromosomes in cleaned TE annotation: ['Chr_1', 'Chr_10', 'Chr_11', 'Chr_12', 'Chr_13', 'Chr_2', 'Chr_4', 'Chr_5', 'Chr_7', 'Chr_8', 'Chr_9', 'Contig1', 'Contig11', 'Contig12', 'Contig15', 'Contig17', 'Contig18', 'Contig19', 'Contig20', 'Contig21', 'Contig22', 'Contig23', 'Contig24', 'Contig25', 'Contig26', 'Contig29', 'Contig3', 'Contig31', 'Contig32', 'Contig33', 'Contig34', 'Contig36', 'Contig38', 'Contig4', 'Contig42', 'Contig46', 'Contig49', 'Contig50', 'Contig51', 'Contig52', 'Contig54', 'Contig56', 'Contig59', 'Contig62', 'Contig65', 'Contig67', 'Contig68', 'Contig69', 'Contig7', 'Contig8', 'Contig9']
sjteresi commented 1 year ago

Hi, it looks like you have a config20 in your TE annotation but not in your gene annotation.

SolayMane commented 1 year ago

I think because we have detected TE in contig20 but no genes was predicted in this contigs. Is that a problematic for the pipeline?

sjteresi commented 1 year ago

Yes, please remove contig 20 from your TE annotation. The contig IDs need to match between the annotations.