In the [readme.md section 3.1 ](3.1 GENESPACE-readable annotation format
For each genome, GENESPACE needs:
bed formatted coordinates of each gene (chr, start, end, name), other fields are allowed, but will be ignored by GENESPACE)it is stated that:
**3.1 GENESPACE-readable annotation format
For each genome, GENESPACE needs:
bed formatted coordinates of each gene (chr, start, end, name), other fields are allowed, but will be ignored by GENESPACE**
This suggests that when a .bed file with more than 4 columns is supplied, as long as the ID column matches the .fasta header the other column the presence of extra columns will not affect the reading of the file.
I have used such a file C_australis_wide.bed and received the following error when trying to read it using init_GENESPACE()$ operator is invalid for atomic vectors
Following the call stack I reach the read_bed() function in Utils.R
I generated a narrow copy of the .bed file in question containing only the first four columns using the following command
cut -f 1-4 C_australis_wide.bed > C_australis_narrow.bed and tested the behaviour of read_bed().
When run on the wide file read_bed() returns logical(0) which causes the subsequent error in init_genespace() which is fixed when using the narrow file.
I believe that the documentation should be updated to reflect that GENESPACE is unable to select the .bed columns purely by itself and that the suppressWarning and suppressmessages in read_bed() should be removed to better inform the user of what the problem is instead of having to follow the call stack to find that the issue was so trivial. I have attached the files for replication purposes.
Thanks for this. I will update the documentation. But keep in mind, this is not a generalizable function (yet), and is ad hoc for the format specified in the readme (4-column bed).
In the [readme.md section 3.1 ](3.1 GENESPACE-readable annotation format For each genome, GENESPACE needs:
bed formatted coordinates of each gene (chr, start, end, name), other fields are allowed, but will be ignored by GENESPACE)it is stated that:
**3.1 GENESPACE-readable annotation format For each genome, GENESPACE needs:
bed formatted coordinates of each gene (chr, start, end, name), other fields are allowed, but will be ignored by GENESPACE**
This suggests that when a .bed file with more than 4 columns is supplied, as long as the ID column matches the
.fasta
header the other column the presence of extra columns will not affect the reading of the file.I have used such a file
C_australis_wide.bed
and received the following error when trying to read it usinginit_GENESPACE()
$ operator is invalid for atomic vectors
Following the call stack I reach theread_bed()
function inUtils.R
I generated a narrow copy of the
.bed
file in question containing only the first four columns using the following commandcut -f 1-4 C_australis_wide.bed > C_australis_narrow.bed
and tested the behaviour ofread_bed()
.When run on the wide file
read_bed()
returnslogical(0)
which causes the subsequent error ininit_genespace()
which is fixed when using the narrow file.I believe that the documentation should be updated to reflect that GENESPACE is unable to select the
.bed
columns purely by itself and that the suppressWarning and suppressmessages in read_bed() should be removed to better inform the user of what the problem is instead of having to follow the call stack to find that the issue was so trivial. I have attached the files for replication purposes.GENESPACE_bug_report.zip