kordk / torch-ecpg

(GPU accelerated) eCpG mapper
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

Simulated dataset #8

Closed kordk closed 1 year ago

kordk commented 1 year ago

To benchmark the CPU versus GPU performance of the eQTM mapping, a simulated dataset will be created.

Simulation dataset guidelines:

kordk commented 1 year ago

Just an FYI - Ritu is workin on this project.

kordk commented 1 year ago

After discussion and given the objective of this part of the project is to determine scalability the simulated dataset will be generated using a bootstrap approach. That way, the correlation structure will be preserved as opposed to randomly generated data, which could be useful if the performance of the tool is impacted by the data. The GTP dataset is sufficiently large to take a bootstrap approach.

olshena commented 1 year ago

The bootstrap approach is to sample people with replacement. So in each simulation some members of the original data will be represented more than once while others will not be represented at all. Sampling in this way maintains correlation structure.

On Thu, Dec 1, 2022 at 2:21 PM kordk @.***> wrote:

After discussion and given the objective of this part of the project is to determine scalability the simulated dataset will be generated using a bootstrap approach. That way, the correlation structure will be preserved as opposed to randomly generated data, which could be useful if the performance of the tool is impacted by the data. The GTP dataset is sufficiently large to take a bootstrap approach.

— Reply to this email directly, view it on GitHub https://github.com/kordk/torch-ecpg/issues/8#issuecomment-1334522492, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5DLAKLZYB5FVXXBBL7YJDWLEQFXANCNFSM6AAAAAAQ2UHAEQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

kordk commented 1 year ago

Closed