building database - Githubissues

Email from user

I went through the tutorial on github and created my own database for mus_musculus GRCm39.106. I then compared it to the one you have online (GRCm38.83), and I noticed that:

for most chromosomes the newer version had heavier files, which I think makes sense because they updated the genome
In the one online, there are a total of 137 files in the folder while in the one I created there are only 119
When running SIFT4g with the new database on previously ran samples, I get fewer hits. For example, in one VCF I tested, the old database resulted in 53 sites in the output while the new one was only 39

I have attached screenshots of the errors I got during database creation. It seems that some files (not sure what these are), weren't in the directory, so couldn't be processed. I also compared the "CHECK_GENES.LOG" file and there are six more of these GL or JH files, so that accounts for the 18 missing files as each records gets three files. My questions are:

what are these files and how are they important?
Any idea why I am missing six from the new assembly?

One suspicion I had was that my internet connection was too slow when downloading the ensembl files that maybe some were left out. I am planning to repeat at a place where I have better internet to test this. But I also wanted to check if you had any additional ideas.

pauline-ng / SIFT4G_Create_Genomic_DB

building database #63