statgen / pheweb

A tool to build a website to browse hundreds or thousands of GWAS.
MIT License
160 stars 67 forks source link

after i renamed the input GWAS files #228

Open jielab opened 5 months ago

jielab commented 5 months ago

Hi,

I previously already used Pheweb to process some large GWAS files. Now my project manager decided to rename some of the input GWAS files, for example, renaming a LDL.gwas.gz file to LDL.2023.gwas.gz.

Now if I rerun pheweb, it will think that there is a new file LDL.2023.gwas.gz and begin to re-process it. Is there a way for me to let Pheweb know that some files are renamed so that it won't re-process them?

Thanks!

JH

pjvandehaar commented 5 months ago

No.

On Wed, Jun 12, 2024 at 11:02 PM Jie Huang @.***> wrote:

Hi,

I previously already used Pheweb to process some large GWAS files. Now my project manager decided to rename some of the input GWAS files, for example, renaming a LDL.gwas.gz file to LDL.2023.gwas.gz.

Now if I rerun pheweb, it will think that there is a new file LDL.2023.gwas.gz and begin to re-process it. Is there a way for me to let Pheweb know that some files are renamed so that it won't re-process them?

Thanks!

JH

— Reply to this email directly, view it on GitHub https://github.com/statgen/pheweb/issues/228, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGSPCOVK2TA6J32JWOKDETZHEDVBAVCNFSM6AAAAABJHPY7VOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM2TAMBTHE4DONQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

pjvandehaar commented 5 months ago

Oh, actually there is. You get to choose the assoc_files field in pheno-list.json.

jielab commented 5 months ago

Thanks, Peter!

Taking my above example. I renamed a LDL.gwas.gz file to LDL.2023.gwas.gz.

In the pheno-list.json file, if I change LDL.gwas.gz to LDL.2023.gwas.gz in the assoc_files field but keep the phenocode field unchanged, I guess pheweb is smart enough to check the timestamp of LDL.2023.gwas.gz and then determined that it is not a new file and therefore did not re-process it.

A few days later, I got more GWAS data. I always use *pheweb phenolist glob --star-is-phenocode "GWAS-DIR/.gz" to create an updated pheno-list.json file. This time, the updated pheno-list.json file will have a new phenocoe of LDL.2023.gwas**. I guess this time pheweb will re-process it, even though it is still the same GWAS file.

Sorry to answer this seemingly complicated question. I was hoping there is a way to batch update file names at some place, so that my renamed GWAS files don't get re-processed. If there is not an easy solution, i will simply re-process them.

Best regards, JH

pjvandehaar commented 5 months ago

You’re understanding of pheweb’s processing sounds correct to me. I don’t understand your exact situation, but your interpretation sounded correct.

On Thu, Jun 13, 2024 at 1:32 AM Jie Huang @.***> wrote:

Thanks, Peter!

Taking my above example. I renamed a LDL.gwas.gz file to LDL.2023.gwas.gz.

In the pheno-list.json file, if I change LDL.gwas.gz to LDL.2023.gwas.gz in the assoc_files field but keep the phenocode field unchanged, I guess pheweb is smart enough to check the timestamp of LDL.2023.gwas.gz and then determined that it is not a new file and therefore did not re-process it.

A few days later, I got more GWAS data. I always use pheweb phenolist glob --star-is-phenocode "GWAS-DIR/.gz" to create an updated pheno-list.json file. This time, the updated pheno-list.json file will have a new phenocoe of LDL.2023.gwas*. I guess this time pheweb will re-process it, even though it is still the same GWAS file.

Sorry to answer this seemingly complicated question. I was hoping there is a way to batch update file names at some place, so that my renamed GWAS files don't get re-processed. If there is not an easy solution, i will simply re-process them.

Best regards, JH

But Pheweb will still re-process this LDL.2023.gwas.gz, because this file does NOT exist in output directories such as generated-by-pheweb/parsed

— Reply to this email directly, view it on GitHub https://github.com/statgen/pheweb/issues/228#issuecomment-2164428982, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGSPCOKCKWSA6H223EEYADZHEVF5AVCNFSM6AAAAABJHPY7VOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRUGQZDQOJYGI . You are receiving this because you commented.Message ID: @.***>

jielab commented 5 months ago

Thanks, Peter!

My situation is: let's say that previously I have 100 GWAS and I run pheweb process on them. It took a few days... Now my group decides to rename those GWAS, for example, adding "2023" or "2024" to the original GWAS names.

In the future, my group will have more GWAS, with names like "2025" or "2026". And I always use *phenolist glob --star-is-phenocode "GWAS-DIR/.gz" to automatically generate and update the pheno-list.json** file.

I am trying to use the new naming system, without spending a few more days to re-processing pheweb for those 100 GWAS.

Anyway, I guess the easiest way is to simply re-process everything, on the renamed GWAS files.

Best regards, JH