Closed mcn3159 closed 7 months ago
I have not tried to run your code, but S107-48 stand out by being the only one with strand 1 whereas others are strand -1 in your dict location_of_hits. Also in your code you do not seem to use that information (i.e. I do not see in your code sthg like location_of_hits[gbk_short][2].
Hi @mcn3159,
Looking at the code and result figure, I don't really understand what is strange. On what data basis did you think pyGenomeViz was plotting completely different chromosome positions? If you can specifically indicate the apparent discrepancies between your assumed results and the actual results, I might be able to suggest some advice.
Hi thanks for getting back to me so soon!
@phyto The problem might be with the lack of strand information given. Is there a way to include that in add_feature_track or add_genbank_features? I didn't see it in the documentation, but I may have missed that.
Hopefully the following indicates where the discrepancies are. I printed the the protein product names I should be seeing in the range of the gbk file given, which is different than what I'm seeing in the figure.
print(gbk_list[0])
for record in SeqIO.parse('mmseq_query_acetylhexosamine/gbk_2_12_24_analyses/S107-48.gbk','genbank'):
for feature in record.features:
if (feature.type == 'CDS') and (feature.location.start > int(gbk_list[0].min_range)) and (feature.location.end < int(gbk_list[0].min_range)+int(gbk_list[0].range_size)):
print(feature.qualifiers.get('product'))
Which outputs
S107-48 (962,415.0 - 1,004,578.0 bp)
['acyltransferase family protein']
['VanZ family protein']
['ABC transporter ATP-binding protein']
['ABC transporter permease']
['cytidylate kinase-like family protein']
['FprA family A-type flavoprotein']
['lantibiotic protection ABC transporter ATP-binding protein']
['lantibiotic immunity ABC transporter MutE/EpiE family permease subunit']
['lantibiotic immunity ABC transporter MutG family permease subunit']
['response regulator transcription factor']
['HAMP domain-containing histidine kinase']
['arsenate reductase family protein']
['ABC-F type ribosomal protection protein']
['ABC transporter ATP-binding protein']
['ABC transporter ATP-binding protein']
['DUF4097 family beta strand repeat protein']
['DUF1700 domain-containing protein']
['PadR family transcriptional regulator']
['response regulator transcription factor']
['sensor histidine kinase']
['1,3-beta-galactosyl-N-acetylhexosamine phosphorylase']
['helix-turn-helix transcriptional regulator']
['LacI family DNA-binding transcriptional regulator']
['sugar ABC transporter permease']
['carbohydrate ABC transporter permease']
['extracellular solute-binding protein']
['glycoside hydrolase family 32 protein']
['chromate transporter']
['chromate transporter']
['Gfo/Idh/MocA family oxidoreductase']
['hypothetical protein']
['dipicolinate synthase subunit B']
['asparagine synthase (glutamine-hydrolyzing)']
['hydroxyethylthiazole kinase']
['thiamine phosphate synthase']
['HAD family phosphatase']
['bifunctional hydroxymethylpyrimidine kinase/phosphomethylpyrimidine kinase']
['Na+/H+ antiporter NhaC family protein']
['C39 family peptidase']
['aspartate--ammonia ligase']
Is there a way to specify what contig/record to use in the Genbank class? I think this would solve my problem
There is probably no good solution to your problem in the pyGenomeViz API. It is best to remove non-target records in the Genbank file directly.
Thanks, here's the workaround I found to filter for a specific contig in the pyGenomeViz gbk object.
contig_with_gene_of_interest = 'ctgS1000000F'
records_to_keep = list(filter(lambda x: x.id == contig_with_gene_of_interest,gbk.records)) # filter records in gbk object
gbk._records = records_to_keep
Also ran into this issue, thanks @mcn3159 for the solution.
Hi,
Thanks for writing such a useful for comparative genomics.
I'm running into a strange issue with plotting a range in a gbk file, despite given coordinates to plot, the figure shows genes on a completely different position on the chromosome than the ones given through the dictionary (this happens specifically for strain S107-48). I can't seem to figure out why this is happening, any suggestions? I attached the gbks I'm using for this along with the code below.
gbks.zip
Which should output