morinlab / GAMBLR

Set of standardized functions to operate with genomic data
https://morinlab.github.io/GAMBLR/
MIT License
3 stars 2 forks source link

Handling of bundled data through designated separate repo #150

Closed Kdreval closed 1 year ago

Kdreval commented 1 year ago

This is related to issues #147 and #149 Currently this set up supports use of ashm coordinates of specific version as GAMBLR is first loaded. The version is stored in config and can be modified on session-by-session, project-by-project basis. If this set up proves to be efficient, it will be extended to handle also lymphoma genes and other bundled GAMBLR data.

Kdreval commented 1 year ago

This is now tested and is ready for review

mattssca commented 1 year ago

Thanks for the updates Kostia, for some reason Git didn't notify me on email that you've updated this branch.

However, I cloned this branch to test the newly added collate_pga and was given an error in return when running the example in the docs. This is the error message that was returned:

> pga_metrics <- collate_pga(these_samples_metadata = meta)
Collating the PGA results ...

Error in UseMethod("filter") : 
no applicable method for 'filter' applied to an object of class "function" 

5. filter(., start <= arm_end & arm_start <= end)

4. arrange(., sample, chrom, start)

3. df %>% filter(start <= arm_end & arm_start <= end) %>% arrange(sample, 
    chrom, start) at utilities.R#3886

2. calculate_pga(this_seg = multi_sample_seg) at utilities.R#3726

1. collate_pga(these_samples_metadata = meta) 

I am not sure if this error is related to dplyr not being specified at the call of filter (line 3886) or if the reason is that the df(same line) is previously undefined, or if it's something completely different. But you might want to look into this. Let me know if you need any help testing.

Thanks,

Kdreval commented 1 year ago

Thanks Adam for catching this! Yes it was because of df was undefined but I have also specified the dplyr::filter() in the next line as it is a best practice. I have pushed the working version and it is going through the GitHub Actions - when passed the PR is ready :smile:

Thanks for the updates Kostia, for some reason Git didn't notify me on email that you've updated this branch.

However, I cloned this branch to test the newly added collate_pga and was given an error in return when running the example in the docs. This is the error message that was returned:

> pga_metrics <- collate_pga(these_samples_metadata = meta)
Collating the PGA results ...

Error in UseMethod("filter") : 
no applicable method for 'filter' applied to an object of class "function" 

5. filter(., start <= arm_end & arm_start <= end)

4. arrange(., sample, chrom, start)

3. df %>% filter(start <= arm_end & arm_start <= end) %>% arrange(sample, 
    chrom, start) at utilities.R#3886

2. calculate_pga(this_seg = multi_sample_seg) at utilities.R#3726

1. collate_pga(these_samples_metadata = meta) 

I am not sure if this error is related to dplyr not being specified at the call of filter (line 3886) or if the reason is that the df(same line) is previously undefined, or if it's something completely different. But you might want to look into this. Let me know if you need any help testing.

Thanks,

mattssca commented 1 year ago

Thanks Kostia! I will clone this branch and try it out!

Kdreval commented 1 year ago

Thanks Adam! Here is what I think:

  1. These have been added and documentation updated/regenerated :white_check_mark:

  2. Added the new parameter to estimate_purity, copy_number_vaf_plot, and updated the documentation :white_check_mark:

  3. I don't have local R/Rstudio and could not test the remote functionality. Could you help with this if you have the remote functionality setup and have testing case? :white_flag:

  4. Yes, this is expected. That sample is problematic and will be eventually kicked out of GAMBL. Here are more details. :white_check_mark:

  5. I noticed this example was removed too - yes. It does generates an error for me too when I test it as is - but it is not capture related or cnv related. The reason for this is that the region_bed argument supplied in this example contains a corrupted entry:

    171 7                               151832010    152133090 KMT2C      
    172 8                                 1993155      2113475 MYOM2      
    173 8   8640864 8751155 MFHAS1             NA           NA NA         
    174 8                                20054878     20084330 ATP6V1B2   
    175 8                                35092975     35654068 UNC5D   

    It of course is not producing a desired output (column in the matrix) and therefore there is mismatch in the length of names and columns - so the function errors. I created an issue related to this issue #193 and this can be addressed in a separate PR. :white_flag:

  6. That function is broken in a sense that it only works with battenberg outputs and needs a revamp on how it works. I have also created issue #194 and it can be addressed separately. :white_flag:

  7. Sure, I can update the flatfiles as well - only thinking it should be done after PR is merged so the code how those files are generated is on master and accessible to everyone/reproducible. :white_flag:

mattssca commented 1 year ago

Thanks for answering my questions Kostia. I can indeed test and address the remote functionality related to capture CNVs. I'll add support for this in my next PR. I can also look into the other outstanding items that you created issues for. I think this PR looks good, I'll approve the changes.

Kdreval commented 1 year ago

Regarding the Question 5 above, I have resolved it and closed the issue #193 . I will wait till the GitHub Actions pass on the latest commit for this branch and will then merge it to master. Thanks Adam!