Open vinuesa opened 3 years ago
here is the modified script to adapt COG2020 https://github.com/SolayMane/FOA_scripts/blob/main/merge_cog202.py
Wrote a second modification for Python 3 and to find the COG IDs for all sequences: https://github.com/kkpenn/merger_COG2020/blob/main/merger_2.py
Hi Sam, thanks for notifying the update. This is great news. Take care
Hello Sam,
Thank you very much for your python scripts for COG annotation, I'm just wondering if the script DIAMOND_COG_Analysis_Counter.py is compatible with recent update for COG2020 database? I tried following your documentation here: https://github.com/transcript/COG however at when I reached the part on Analyzing the results of a DIAMOND search against the COG database, I encountered this error:
python DIAMOND_COG_analysis_counter.py -I LbSa_Pan.cogs -O LbSaResults.cog -D merged_cogs.fa
Analysis of LbSa_Pan.cogs complete. Number of total lines: 67960 Number of unique sequences: 3181 Time elapsed: 0.32341003418 seconds.
Starting database analysis now.
Traceback (most recent call last):
File "DIAMOND_COG_analysis_counter.py", line 125, in
The version of DIAMOND_COG_Analysis_Counter.py that I used is the one that has been updated for time.clock().
Again thank you very much, I appreciate any help. Stay safe!
Hello @Rmmendoza. You can deal with this problem by deleting the pip | in the script.
I have a Question how can I deal with this Problem ? org = db_hier_dictionary[entry] Analysis of test.cogs complete. Number of total lines: 1 Number of unique sequences: 1 Time elapsed: 0.000323057174683 seconds.
Starting database analysis now. 1000000 lines processed so far in 7.30374193192 seconds. 2000000 lines processed so far in 14.0681209564 seconds. 3000000 lines processed so far in 20.5442709923 seconds.
Success!
Time elapsed: 21.8866050243 seconds.
Number of lines: 3213025
Number of errors: 0
Traceback (most recent call last):
File "/home/abdelmalek/DIAMOND_COG_analysis_counter1.py", line 158, in
the protein with key = WP_101828536_1 absent in your dict db_hier_dictionary
Dear @SolayMane. Could you please help me to resolve this issue? Analysis of 1pantoea.cogs complete. Number of total lines: 1 Number of unique sequences: 1 Time elapsed: 0.0009813308715820312 seconds.
Starting database analysis now.
Traceback (most recent call last):
File "DIAMOND_COG_analysis_counter1.py", line 125, in
@Malokidz can you past here your python code here
Dear @SolayMane. Could you please help me to resolve this issue? Analysis of 1pantoea.cogs complete. Number of total lines: 1 Number of unique sequences: 1 Time elapsed: 0.0009813308715820312 seconds.
Starting database analysis now.
Traceback (most recent call last):
File "DIAMOND_COG_analysis_counter1.py", line 125, in
Dear @SolayMane. I am using the DIAMOND_COG_analysis_counter.py script. My Problem is in uilding a dictionary of the reference database.
db_hier_dictionary = {} db_line_counter = 0 db_error_counter = 0
for line in db: if line.startswith(">") == True: db_line_counter += 1 splitline = line.split("|")
# ID, the hit returned in DIAMOND results
db_id = str(splitline[0] + "|" + splitline[1] + "|" + splitline[2] + "|" + splitline[3] + "|")[1:]
# name and functional description
if "NO COG FOUND" in splitline[1]:
db_hier = "NO HIERARCHY"
else:
hier_split = line.split("|")
db_hier = hier_split[5] + " | " + hier_split[6].strip()
# add to dictionaries
db_hier_dictionary[db_id] = db_hier
# line counter to show progress
if db_line_counter % 1000000 == 0: # each million
t95 = time.time()
print (str(db_line_counter) + " lines processed so far in " + str(t95-t2) + " seconds.")
t3 = time.time()
print ("\nSuccess!") print ("Time elapsed: " + str(t3-t2) + " seconds.") print ("Number of lines: " + str(db_line_counter)) print ("Number of errors: " + str(db_error_counter))
Dear @SolayMane I am using the DIAMOND_COG_analysis_counter.py script for one of the analysis. But getting following error, though that key is present in the dictionary. Could you please help me to resolve this issue.
python Diamond.py -I test.cogs -O result.cogs -D merged_cogs.fa
Analysis of test.cogs complete. Number of total lines: 25 Number of unique sequences: 1 Time elapsed: 7.796287536621094e-05 seconds.
Starting database analysis now. 1000000 lines processed so far in 2.3694207668304443 seconds. 2000000 lines processed so far in 4.80712628364563 seconds. 3000000 lines processed so far in 7.313858270645142 seconds.
Success!
Time elapsed: 7.811172246932983 seconds.
Number of lines: 3213025
Number of errors: 0
Traceback (most recent call last):
File "/home/srmap/Desktop/sharayu/tools/COG/COG-master/Diamond.py", line 158, in
/Desktop/sharayu/tools/COG/COG-master$ grep "WP_003663563.1" merged_cogs.fa
WP_003663563.1 ribonucleoside-diphosphate reductase subunit alpha [Moraxella catarrhalis] | COG0209 | F
I am new to coding. These are the fixes I made to get it working. Please let me know if there is something fundamentally wrong about how I approached these issues.
Hi
The code that you shared is executing nicely. Thank you for it. Could you please tell me how can I use it for multiple genomes in one go?
On Wed, Oct 12, 2022, 1:58 AM Proelmocan23 @.***> wrote:
I am new to coding. These are the fixes I made to get it working. Please let me know if there is something fundamentally wrong about how I approached these issues.
— Reply to this email directly, view it on GitHub https://github.com/transcript/COG/issues/2#issuecomment-1275237599, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3DE4JCBW4A5WD74QLWYNIDWCXEVLANCNFSM4XSYNZMQ . You are receiving this because you commented.Message ID: @.***>
Hi @Thewhitewolf8,
I'm not sure what you mean by multiple genomes in one go? Could you give me more details about the aim of your project?
I assume you are running this on linux? So could a manually creating a .sh file be useful to you?
contents of the file would be as follows:
python3 DIAMOND_COG2020_analysis_counter.py -I your_genome_1.cogs -O result_1.cogs -D merged_cogs.fa
python3 DIAMOND_COG2020_analysis_counter.py -I your_genome_2.cogs -O result_2.cogs -D merged_cogs.fa
python3 DIAMOND_COG2020_analysis_counter.py -I your_genome_3.cogs -O result_3.cogs -D merged_cogs.fa
Thanks for your response. Actually I have around 300 genomes of bacteriophages. And i want to annotate then all together. Like it should show cog categorisation for around 1lack orfs. That I am not getting.
On Wed, Dec 7, 2022, 6:14 AM Proelmocan23 @.***> wrote:
Hi @Thewhitewolf8 https://github.com/Thewhitewolf8,
I'm not sure what you mean by multiple genomes in one go? Could you give me more details about the aim of your project?
I assume you are running this on linux? So could a manually creating a .sh file be useful to you?
contents of the file would be as follows:
!/bin/bash
python3 DIAMOND_COG2020_analysis_counter.py -I your_genome_1.cogs -O result_1.cogs -D merged_cogs.fa
python3 DIAMOND_COG2020_analysis_counter.py -I your_genome_2.cogs -O result_2.cogs -D merged_cogs.fa
python3 DIAMOND_COG2020_analysis_counter.py -I your_genome_3.cogs -O result_3.cogs -D merged_cogs.fa
— Reply to this email directly, view it on GitHub https://github.com/transcript/COG/issues/2#issuecomment-1340217980, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3DE4JBPADHME26LNRG52E3WL7MYBANCNFSM4XSYNZMQ . You are receiving this because you were mentioned.Message ID: @.***>
Hi @kkpenn,
I am using your script in update COG files. But I having this error:
File "merger_2.py", line 16 print(f"\nCog file read. Time elapsed: {t1-t0} seconds.") ^ SyntaxError: invalid syntax
Hi @SolayMane ,
Your script is unavailable. the webpage is not available.
Hi @ShwetaaPandey , you can find it here https://github.com/SolayMane/MyToolBox/blob/main/merge_cog202.py
Hi @SolayMane , I am having an error while your script:
Cog file read. Time elapsed: 3.680607 seconds.
Traceback (most recent call last):
File "merge_cog202.py", line 30, in
Hi Sam, thanks for sharing your code on GitHub. Your scripts work smoothly. I was just wondering if you were planing to update your merge.py script to process the updated COG2020 database file formats.
Reference https://academic.oup.com/nar/article/49/D1/D274/5964069 ftp://ftp.ncbi.nih.gov/pub/COG/COG2020/data