Closed schultzm closed 8 years ago
Would be great if you could just put your edits into the original script and push the changes.
On 22 March 2016 at 3:36:04 pm, Mark Schultz (notifications@github.commailto:notifications@github.com) wrote:
Actually, it's a bit more than the above. Need to remove the whole "(gi:xxxxxxxx)" bit. Here's how I did it using python:
with open("VFDB_setB_nt.fas.ffn", "r") as input_handle: with open("VFDB_setB_nt.fas.ffn_edit", "w") as output_handle: for line in input_handle: if ' (gi:' in line: line = line.split() output_handle.write(line[0]+' '+' '.join(line[2:])+'\n') else: output_handle.write(line)
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHubhttps://github.com/katholt/srst2/issues/59#issuecomment-199634498
Error in step: Using the VFDB Virulence Factor Database with SRST2
1: The URL to the VFDB has changed and, using wget, should now be:
wget http://www.mgc.ac.cn/VFs/Down/VFDB_setB_nt.fas.gz
2: We need three extra steps for the database parsing scripts to work.
i)
gunzip VFDB_setB_nt.fas.gz
ii)mv VFDB_setB_nt.fas VFDB_setB_nt.fas.ffn
iii) save the following code as convert.py and execute aspython convert.py
, which gets rid of the "(gi:xxxxx)" bit in the sequence headers and the commas (replacing with '_') because downstream processing splits on the commas:Without removing the text "(gi:xxxxx)" from the header, this next step will not work:
Also, there is a redundant comment on line 46 in the script VFDB_cdhit_to_csv.py (should read "VFGxxxxxx"):