slimsuite / chromsyn

Chromosome-level synteny plotting using orthologous regions
GNU General Public License v3.0
26 stars 4 forks source link

Script terminates at the feature file function #5

Open mzywj2 opened 3 months ago

mzywj2 commented 3 months ago

Hi,

An error appears for the feature file as shown in the image I have attached. To see if it would move forward I added a ft.fofn empty file but there is no difference. Does the script require GFF files converted to tsv and a ft.fofn for those files? I'm currently running augustus on my fastas and converting the gff to tsv to see if this will solve the issue, but a little confused since the test run dataset or the guidelines don't mention the need for such a file.

Thank you Screenshot 2024-06-06 152255

slimsuite commented 3 months ago

Hi. I'd like to help you solve this but I need more information about how you are trying to run it and what the input files look like. It shouldn't need a ft.fofn file to run. It looks like the current error is being caused by the fofn file having the wrong format. Every *.fofn file needs two columns: the genome alias, and the corresponding file path.

mzywj2 commented 3 months ago

Hi,

Thank you for getting back to me. I'm running the script on the R terminal using Rscript chromsyn.R

I've added a feature and ft.fofn file as well and the script now runs till the end but I get a different error now (image attached)

The input file names are attached in a image as well, and the fofn files are also attached.

Please let me know if you would like any more of the input files?

Kind Regards


From: EdwardsLab @.> Sent: 07 June 2024 16:08 To: slimsuite/chromsyn @.> Cc: Kavithi Jayasundara @.>; Author @.> Subject: Re: [slimsuite/chromsyn] Script terminates at the feature file function (Issue #5)

You don't often get email from @.*** Learn why this is importanthttps://aka.ms/LearnAboutSenderIdentification

Hi. I'd like to help you solve this but I need more information about how you are trying to run it and what the input files look like. It shouldn't need a ft.fofn file to run. It looks like the current error is being caused by the fofn file having the wrong format. Every *.fofn file needs two columns: the genome alias, and the corresponding file path.

— Reply to this email directly, view it on GitHubhttps://github.com/slimsuite/chromsyn/issues/5#issuecomment-2155034214, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BEEF24IA6YDQBOA6XOJ2B6DZGHEGTAVCNFSM6AAAAABI43LS6OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJVGAZTIMRRGQ. You are receiving this because you authored the thread.Message ID: @.***>

This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please contact the sender and delete the email and attachment. Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham. Email communications with the University of Nottingham may be monitored where permitted by law.

mzywj2 commented 3 months ago

Hi,

I'll also copy the full output I get in the terminal when I run the script.

Thank you


From: EdwardsLab @.> Sent: 07 June 2024 16:08 To: slimsuite/chromsyn @.> Cc: Kavithi Jayasundara @.>; Author @.> Subject: Re: [slimsuite/chromsyn] Script terminates at the feature file function (Issue #5)

You don't often get email from @.*** Learn why this is importanthttps://aka.ms/LearnAboutSenderIdentification

Hi. I'd like to help you solve this but I need more information about how you are trying to run it and what the input files look like. It shouldn't need a ft.fofn file to run. It looks like the current error is being caused by the fofn file having the wrong format. Every *.fofn file needs two columns: the genome alias, and the corresponding file path.

— Reply to this email directly, view it on GitHubhttps://github.com/slimsuite/chromsyn/issues/5#issuecomment-2155034214, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BEEF24IA6YDQBOA6XOJ2B6DZGHEGTAVCNFSM6AAAAABI43LS6OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJVGAZTIMRRGQ. You are receiving this because you authored the thread.Message ID: @.***>

This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please contact the sender and delete the email and attachment. Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham. Email communications with the University of Nottingham may be monitored where permitted by law.

[Fri Jun 7 17:33:40 2024] Sequence FOFN File: sequences.fofn [Fri Jun 7 17:33:40 2024] No region data file given. [Fri Jun 7 17:33:40 2024] TIDK FOFN file: tidk.fofn [Fri Jun 7 17:33:40 2024] Assembly gap FOFN file: gaps.fofn [Fri Jun 7 17:33:40 2024] features table FOFN file: ft.fofn [Fri Jun 7 17:33:40 2024] BUSCO full FOFN file: busco.fofn [Fri Jun 7 17:33:40 2024] BUSCO FOFN file: busco.fofn [Fri Jun 7 17:33:41 2024] #RCODE Setup complete. [Fri Jun 7 17:33:41 2024] #FOFN 2 filenames loaded from sequences.fofn [Fri Jun 7 17:33:41 2024] #FOFN 2 filenames loaded from busco.fofn [Fri Jun 7 17:33:41 2024] #FOFN 2 filenames after filtering to recognised genomes. Joining with by = join_by(Genome) [Fri Jun 7 17:33:41 2024] Genomes (order=LIST): C_officinalis, Cexcelsa_scaf_1 [1] "Checking TIDK file:" [1] TRUE [Fri Jun 7 17:33:41 2024] #FOFN 2 filenames loaded from tidk.fofn [Fri Jun 7 17:33:41 2024] #FOFN 2 filenames after filtering to recognised genomes. [1] "tidkfiles:" 'data.frame': 2 obs. of 2 variables: $ Genome: chr "C_officinalis" "Cexcelsa_scaf_1" $ TIDK : chr "C:/Users/kavit/OneDrive/Documents/Genome/C_officinalis_telomeric_repeat_windows.csv" "C:/Users/kavit/OneDrive/Documents/Genome/Cexcelsa_scaf_1_telomeric_repeat_windows.csv" NULL Genome 1 C_officinalis 2 Cexcelsa_scaf_1 TIDK 1 C:/Users/kavit/OneDrive/Documents/Genome/C_officinalis_telomeric_repeat_windows.csv 2 C:/Users/kavit/OneDrive/Documents/Genome/Cexcelsa_scaf_1_telomeric_repeat_windows.csv Joining with by = join_by(Genome) [1] "gendb after TIDK join:" 'data.frame': 2 obs. of 4 variables: $ Genome : chr "C_officinalis" "Cexcelsa_scaf_1" $ SeqFile: chr "C:/Users/kavit/OneDrive/Documents/Genome/C_officinalis.telomeres.tdt" "C:/Users/kavit/OneDrive/Documents/Genome/Cexcelsa_scaf_1.telomeres.tdt" $ BUSCO : chr "C:/Users/kavit/OneDrive/Documents/Genome/C_officinalis.busco5.tsv" "C:/Users/kavit/OneDrive/Documents/Genome/Cexcelsa.busco5.tsv" $ TIDK : chr "C:/Users/kavit/OneDrive/Documents/Genome/C_officinalis_telomeric_repeat_windows.csv" "C:/Users/kavit/OneDrive/Documents/Genome/Cexcelsa_scaf_1_telomeric_repeat_windows.csv" NULL Genome 1 C_officinalis 2 Cexcelsa_scaf_1 SeqFile 1 C:/Users/kavit/OneDrive/Documents/Genome/C_officinalis.telomeres.tdt 2 C:/Users/kavit/OneDrive/Documents/Genome/Cexcelsa_scaf_1.telomeres.tdt BUSCO 1 C:/Users/kavit/OneDrive/Documents/Genome/C_officinalis.busco5.tsv 2 C:/Users/kavit/OneDrive/Documents/Genome/Cexcelsa.busco5.tsv TIDK 1 C:/Users/kavit/OneDrive/Documents/Genome/C_officinalis_telomeric_repeat_windows.csv 2 C:/Users/kavit/OneDrive/Documents/Genome/Cexcelsa_scaf_1_telomeric_repeat_windows.csv [1] "Checking gaps file:" [1] TRUE [Fri Jun 7 17:33:41 2024] #FOFN 2 filenames loaded from gaps.fofn [Fri Jun 7 17:33:41 2024] #FOFN 2 filenames after filtering to recognised genomes. [1] "gapfiles:" 'data.frame': 2 obs. of 2 variables: $ Genome: chr "C_officinalis" "Cexcelsa_scaf_1" $ gaps : chr "C:/Users/kavit/OneDrive/Documents/Genome/C_officinalis.gaps.tdt" "C:/Users/kavit/OneDrive/Documents/Genome/Cexcelsa_scaf_1.gaps.tdt" NULL Genome 1 C_officinalis 2 Cexcelsa_scaf_1 gaps 1 C:/Users/kavit/OneDrive/Documents/Genome/C_officinalis.gaps.tdt 2 C:/Users/kavit/OneDrive/Documents/Genome/Cexcelsa_scaf_1.gaps.tdt Joining with by = join_by(Genome) [1] "Checking features file:" [1] TRUE [Fri Jun 7 17:33:41 2024] #FOFN 2 filenames loaded from ft.fofn [Fri Jun 7 17:33:41 2024] #FOFN 2 filenames after filtering to recognised genomes. [1] "ftfiles:" 'data.frame': 2 obs. of 2 variables: $ Genome : chr "C_officinalis" "Cexcelsa_scaf_1" $ features: chr "C:/Users/kavit/OneDrive/Documents/Genome/C_officinalis.tsv" "C:/Users/kavit/OneDrive/Documents/Genome/Cexcelsa_scaf_1.tsv" NULL Genome features 1 C_officinalis C:/Users/kavit/OneDrive/Documents/Genome/C_officinalis.tsv 2 Cexcelsa_scaf_1 C:/Users/kavit/OneDrive/Documents/Genome/Cexcelsa_scaf_1.tsv Joining with by = join_by(Genome) [1] "gendb after features join:" 'data.frame': 2 obs. of 6 variables: $ Genome : chr "C_officinalis" "Cexcelsa_scaf_1" $ SeqFile : chr "C:/Users/kavit/OneDrive/Documents/Genome/C_officinalis.telomeres.tdt" "C:/Users/kavit/OneDrive/Documents/Genome/Cexcelsa_scaf_1.telomeres.tdt" $ BUSCO : chr "C:/Users/kavit/OneDrive/Documents/Genome/C_officinalis.busco5.tsv" "C:/Users/kavit/OneDrive/Documents/Genome/Cexcelsa.busco5.tsv" $ TIDK : chr "C:/Users/kavit/OneDrive/Documents/Genome/C_officinalis_telomeric_repeat_windows.csv" "C:/Users/kavit/OneDrive/Documents/Genome/Cexcelsa_scaf_1_telomeric_repeat_windows.csv" $ gaps : chr "C:/Users/kavit/OneDrive/Documents/Genome/C_officinalis.gaps.tdt" "C:/Users/kavit/OneDrive/Documents/Genome/Cexcelsa_scaf_1.gaps.tdt" $ features: chr "C:/Users/kavit/OneDrive/Documents/Genome/C_officinalis.tsv" "C:/Users/kavit/OneDrive/Documents/Genome/Cexcelsa_scaf_1.tsv" NULL Genome 1 C_officinalis 2 Cexcelsa_scaf_1 SeqFile 1 C:/Users/kavit/OneDrive/Documents/Genome/C_officinalis.telomeres.tdt 2 C:/Users/kavit/OneDrive/Documents/Genome/Cexcelsa_scaf_1.telomeres.tdt BUSCO 1 C:/Users/kavit/OneDrive/Documents/Genome/C_officinalis.busco5.tsv 2 C:/Users/kavit/OneDrive/Documents/Genome/Cexcelsa.busco5.tsv TIDK 1 C:/Users/kavit/OneDrive/Documents/Genome/C_officinalis_telomeric_repeat_windows.csv 2 C:/Users/kavit/OneDrive/Documents/Genome/Cexcelsa_scaf_1_telomeric_repeat_windows.csv gaps 1 C:/Users/kavit/OneDrive/Documents/Genome/C_officinalis.gaps.tdt 2 C:/Users/kavit/OneDrive/Documents/Genome/Cexcelsa_scaf_1.gaps.tdt features 1 C:/Users/kavit/OneDrive/Documents/Genome/C_officinalis.tsv 2 C:/Users/kavit/OneDrive/Documents/Genome/Cexcelsa_scaf_1.tsv [Fri Jun 7 17:33:41 2024] #GENOME 2 genomes: C_officinalis, Cexcelsa_scaf_1 [Fri Jun 7 17:33:42 2024] C_officinalis... [Fri Jun 7 17:33:42 2024] #SEQS 5 C_officinalis sequences loaded from C:/Users/kavit/OneDrive/Documents/Genome/C_officinalis.telomeres.tdt [Fri Jun 7 17:33:42 2024] #SEQS 5 C_officinalis sequences meet minlen cutoff of 0 bp [Fri Jun 7 17:33:42 2024] #BUSCOV BUSCO v5 format [Fri Jun 7 17:33:42 2024] #BUSCO 473 C_officinalis Complete BUSCO genes loaded from C:/Users/kavit/OneDrive/Documents/Genome/C_officinalis.busco5.tsv [Fri Jun 7 17:33:42 2024] #BUSCO 473 C_officinalis Complete BUSCO genes following filtering to 5 sequences. [Fri Jun 7 17:33:42 2024] #BUSCOV BUSCO v5 format [Fri Jun 7 17:33:42 2024] #BUSCO 1296 C_officinalis Duplicated BUSCO genes loaded from C:/Users/kavit/OneDrive/Documents/Genome/C_officinalis.busco5.tsv [Fri Jun 7 17:33:42 2024] #BUSCO 1296 C_officinalis Duplicated BUSCO genes following filtering to 5 sequences. [Fri Jun 7 17:33:42 2024] #TIDK 4 TIDK telomere windows loaded from C:/Users/kavit/OneDrive/Documents/Genome/C_officinalis_telomeric_repeat_windows.csv [Fri Jun 7 17:33:42 2024] #TIDK 4 TIDK telomere windows loaded from C:/Users/kavit/OneDrive/Documents/Genome/C_officinalis_telomeric_repeat_windows.csv [Fri Jun 7 17:33:42 2024] #FT 1 features loaded from C:/Users/kavit/OneDrive/Documents/Genome/C_officinalis.gaps.tdt [Fri Jun 7 17:33:43 2024] #FT 305504 features loaded from C:/Users/kavit/OneDrive/Documents/Genome/C_officinalis.tsv [Fri Jun 7 17:33:43 2024] Cexcelsa_scaf_1... [Fri Jun 7 17:33:43 2024] #SEQS 1 Cexcelsa_scaf_1 sequences loaded from C:/Users/kavit/OneDrive/Documents/Genome/Cexcelsa_scaf_1.telomeres.tdt [Fri Jun 7 17:33:43 2024] #SEQS 1 Cexcelsa_scaf_1 sequences meet minlen cutoff of 0 bp [Fri Jun 7 17:33:43 2024] #TIDK 3 TIDK telomere windows loaded from C:/Users/kavit/OneDrive/Documents/Genome/Cexcelsa_scaf_1_telomeric_repeat_windows.csv [Fri Jun 7 17:33:43 2024] #FT 1 features loaded from C:/Users/kavit/OneDrive/Documents/Genome/Cexcelsa_scaf_1.gaps.tdt [Fri Jun 7 17:33:44 2024] #FT 149611 features loaded from C:/Users/kavit/OneDrive/Documents/Genome/Cexcelsa_scaf_1.tsv [Fri Jun 7 17:33:44 2024] #SEQS 6 sequences loaded in total. [Fri Jun 7 17:33:44 2024] #BUSCO 473 Complete BUSCO genes loaded in total. [Fri Jun 7 17:33:44 2024] #DUPL 1296 Duplicated BUSCO genes loaded in total. [Fri Jun 7 17:33:44 2024] #TIDK 7 TIDK telomere windows loaded in total. [Fri Jun 7 17:33:44 2024] #FT 455115 features loaded in total. [Fri Jun 7 17:33:44 2024] #GAPS 2 assembly gaps loaded in total. [Fri Jun 7 17:33:44 2024] #BUSCO 473 BUSCO genes for loaded sequences. Joining with by = join_by(BuscoID) Error in $<-.data.frame(*tmp*, Strand, value = "-") : replacement has 1 row, data has 0 Calls: $<- -> $<-.data.frame Execution halted

slimsuite commented 3 months ago

Can you check whether the first genes in the BUSCO file(s) are rated as Missing? Probably the C_officinalis results, given it's low Completeness. If so, then removing these genes or moving one of the Fragmented/Complete ones to the top should fix the issue. (I can't remember if I fixed this bug or not.)

mzywj2 commented 3 months ago

Hi,

Yes the first gene in both the C_officinalis and the Cexcelsa_scaf_1 buscos are rated missing, I removed the the first gene which was rated as missing (attached the busco files with the missing first gene removed) however, I'm still getting the same error that: Error in $<-.data.frame(*tmp*, Strand, value = "-") : replacement has 1 row, data has 0 Calls: $<- -> $<-.data.frame Execution halted

Is there anything else I can do to remove this error and get an output?

Thank you


From: EdwardsLab @.> Sent: 09 June 2024 09:01 To: slimsuite/chromsyn @.> Cc: Kavithi Jayasundara @.>; Author @.> Subject: Re: [slimsuite/chromsyn] Script terminates at the feature file function (Issue #5)

You don't often get email from @.*** Learn why this is importanthttps://aka.ms/LearnAboutSenderIdentification

Can you check whether the first genes in the BUSCO file(s) are rated as Missing? Probably the C_officinalis results, given it's low Completeness. If so, then removing these genes or moving one of the Fragmented/Complete ones to the top should fix the issue. (I can't remember if I fixed this bug or not.)

— Reply to this email directly, view it on GitHubhttps://github.com/slimsuite/chromsyn/issues/5#issuecomment-2156376614, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BEEF24OD5B4A7QRCELRMCJTZGQDWLAVCNFSM6AAAAABI43LS6OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJWGM3TMNRRGQ. You are receiving this because you authored the thread.Message ID: @.***>

This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please contact the sender and delete the email and attachment. Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham. Email communications with the University of Nottingham may be monitored where permitted by law.

slimsuite commented 2 months ago

Just to clarify... you removed all the Missing genes until you reached a not-Missing one?

The easiest thing might be to email me your input files (rich.edwards@uwa.edu.au) and I will see if I can recreate the error. I don't think attachments from the emails get attached to the issue on GitHub.

slimsuite commented 2 months ago

I received your email. I will take a look.

mzywj2 commented 2 months ago

Thank you so much!


From: EdwardsLab @.> Sent: 12 June 2024 13:27 To: slimsuite/chromsyn @.> Cc: Kavithi Jayasundara @.>; Author @.> Subject: Re: [slimsuite/chromsyn] Script terminates at the feature file function (Issue #5)

You don't often get email from @.*** Learn why this is importanthttps://aka.ms/LearnAboutSenderIdentification

I received your email. I will take a look.

— Reply to this email directly, view it on GitHubhttps://github.com/slimsuite/chromsyn/issues/5#issuecomment-2162882305, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BEEF24P5O3HKDBMAYV37OQLZHA5CPAVCNFSM6AAAAABI43LS6OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRSHA4DEMZQGU. You are receiving this because you authored the thread.

This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please contact the sender and delete the email and attachment. Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham. Email communications with the University of Nottingham may be monitored where permitted by law.

mzywj2 commented 2 months ago

Hi,

I understand you might be busy, but just wondering if there's any update on the error.

Thank you


From: EdwardsLab @.> Sent: 12 June 2024 13:27 To: slimsuite/chromsyn @.> Cc: Kavithi Jayasundara @.>; Author @.> Subject: Re: [slimsuite/chromsyn] Script terminates at the feature file function (Issue #5)

You don't often get email from @.*** Learn why this is importanthttps://aka.ms/LearnAboutSenderIdentification

I received your email. I will take a look.

— Reply to this email directly, view it on GitHubhttps://github.com/slimsuite/chromsyn/issues/5#issuecomment-2162882305, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BEEF24P5O3HKDBMAYV37OQLZHA5CPAVCNFSM6AAAAABI43LS6OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRSHA4DEMZQGU. You are receiving this because you authored the thread.Message ID: @.***>

This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please contact the sender and delete the email and attachment. Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham. Email communications with the University of Nottingham may be monitored where permitted by law.

slimsuite commented 2 months ago

Hi, I made new FOFN files and it ran fine. I will send you an email now with the files and command I used. Looks like you might have a ployploidy with C_officinalis so I will send a suggestion for exploring this too, in case it is useful. Rich

slimsuite commented 2 months ago

PS. I ran it with the full BUSCO file ("with_missing") and it was fine, so this was not an issue. PPS. I note that there are a LOT of features in the features TSV file - 455,115 in total. I am not sure what these are, but having this many might cause issues and certainly won't plot in a useful way. I'd recommend running without the ft.fofn file.