jokergoo / rGREAT

GREAT Analysis - Functional Enrichment on Genomic Regions
https://jokergoo.github.io/rGREAT
Other
81 stars 11 forks source link

Handling large data sets beyond the limits of allowed File size #22

Closed peranti closed 2 years ago

peranti commented 3 years ago

Hello @jokergoo,

Thanks for creating this package to use the GREAT tool.

According to the documentation here, GREAT supports up to 0.5M and 1M regions for test and background:

GREAT currently supports test files with up to 500,000 test regions and background files with up to 1,000,000 regions. Each must be less than 50 MB in size. Compressed data must decompress to plain text files at most 50 MB in size.

I would like to use the tool for 2.8M regions against the background of 4M regions. Would it be possible to perform the test using the functionality present in the package? Should I split them to fit in the above range with multiple queries?

I am asking to see if you are up to propose any new functionalities for the existing Bioconductor version of the package?

jokergoo commented 3 years ago

No, rGERAT package only sends the data to GREAT server. For your case, I would suggest to rando sample a subset of regions for the enrichment analysis.

peranti commented 3 years ago

Thanks, @jokergoo for your view. But how that approach help giving input of desired background files instead of the system given?