morinlab / GAMBLR

Set of standardized functions to operate with genomic data
https://morinlab.github.io/GAMBLR/
MIT License
3 stars 2 forks source link

`get_ssm_by_region` dies if specified region has no mutations #235

Open lkhilton opened 1 year ago

lkhilton commented 1 year ago

I haven't fully debugged this, but I've been trying to retrieve variants for a small aSHM region. It works for genomes, but not capture:

regions <- GAMBLR:::process_regions(only_regions = "BCL6")

dim(get_ssm_by_region(
    region =regions$regions[2], 
    seq_type = "genome"
)) 
# [1] 239  45 

get_ssm_by_region(
    region =regions$regions[1],
    seq_type = "capture"
) %>% dim()

# [1] 543  45 

get_ssm_by_region(
    region =regions$regions[2],
    seq_type = "capture"
)

# Error in methods::as(data[[i]], colClasses[i]) : 
#   no method or default for coercing “character” to “l”

I believe the capture maf file doesn't have any rows matching the region described by regions$regions[2] so vroom is choking on the specified column types. Can you please investigate? Ideally this would be robust to even small regions where one might find no mutations.

rdmorin commented 1 year ago

@HoumanLM Could your new indexing approach using the RSamtools package help us resolve this?

HoumanLM commented 1 year ago

@rdmorin For the tabix part that we have discussed earlier, I used Rsamtools for having a function similar to 'get_ssm_by_region' so it seems that it can solve the issue.