Closed Kdreval closed 1 year ago
At first glimpse, I like this solution and I can definitely understand the issue that you are describing here. I can make the necessary adjustments on my branch, do some tests, and push to my active PR. Thanks, Kostia!
Currently, the
get_manta_sv()
callsget_gambl_metadata()
internally even thoughthese_samples_metadata
is a specified argument to determine samples missing from flatfile. This results in a situation where even for one sample missing from flatfile, we collect and import bedpes from the entire GAMBL and then function subsets resulting df to that one sample. This results in an unnecessarily long processing time. This is a small example for a random sample missing from flatfile:This is a best-case scenario because during peak hours it can be ~ 4 min for one sample.
The fix I think is to replace the line here with