Open nicholascdove opened 1 year ago
Hmm, maybe it's not a regex thing? I renamed the MIBiG gbks to try to match my gbks that worked.
My gbks that were parsed, inserted, and clustered looked like this: AIM000021_asm31892_contig20486033.region001.gbk
Original MIBiG gbk: BGC0002286.gbk
Trying to add a region
string: BGC0002286.region001.gbk
Trying to break the BGC
part of the regex definition so that it uses ^.+\\.region[0-9]+$
: ABGC0002286.region001.gbk
Unfortunately none of these naming "hacks" were able to get BiG-SLiCE to recognize these MIBiG gbks as AntiSMASH gbks. Also, all of the files (my gbks and the MIBiG gbks) were in the same folder, so I don't think its a directory issue.
Not actually "closed"; I just hit the wrong button. :)
Looks like my issue has more to do with the parse_gbk() command from bgc.py. On line 98-170, there is an if/else statement that treats different versions of AntiSMASH gbks differently.
Line 98: if antismash_version.split(".")[0] in ["5", "6"]:
Line 170: else: # assume antiSMASH 4
The problem is that current MIBiG gbks do not have an AntiSMASH version:
So, this if/else treats them like an antiSMASH 4 gbk and searches for the feature cluster
, and therefore, does not recognize them as an antiSMASH gbk.
Line 170-182:
else: # assume antiSMASH 4
cluster = None
for feature in gbk.features:
if feature.type == "cluster":
if cluster: # contain 2 or more clusters
cluster = None
break
else:
cluster = feature
if not cluster:
print(orig_gbk_path +
" is not a recognized antiSMASH clustergbk")
break
Maybe this is the issue? Please let me know. Thanks!
Okay, I figured it out. The assumption in my last comment was correct.
For others who are running into a similar issue, my work around was to change the version in the MIBiG gbk from FALSE
to 5.0.0
. You can use the following code in a for
loop:
sed 's/Version :: False/Version :: 5.0.0/' BGC000001.gbk > BGC000001.gbk
I'm going to leave the issue open so the bug can be fixed in the package :)
Here is the command for batch modification:
for i in mibig_gbk_3.1/*.gbk; do sed -i 's/Version :: False/Version :: 5.0.0/' $i; done
Hi Satria,
Thanks for the great package. I'm having difficulty clustering gbks from MIBiG.
I made an input folder, downloaded MIBiG, and placed the gbks in the input folder.
I did the same for my own gbks run through AntiSMASH.
I also made a dummy manifest and taxonomy file (I don't use the sqlite db, I end up parsing it and joining taxonomy from a separate database).
When I run
during the
parsing and inserting
step, I get:gbks/BGC0000056.gbk is not a recognized antiSMASH clustergbk
. And, I get the same message for each MIBiG gbk. At the same time, my own gbks seem to work.Can you help? I'm wondering if it has to do with the eligible regex definitions on a newer release of MIBiG? I'd try to debug myself, but my programming skills are pretty novice.
Thanks! Nicholas