maximilianh / crisporWebsite

All source code of the crispor.org website
http://crispor.org
Other
71 stars 42 forks source link

I ran into a problem that the deep-cpf1 score generated from the command line is different from the results of the web version #61

Open guomaoping opened 1 year ago

guomaoping commented 1 year ago

Hello, I ran into a problem that the deep-cpf1 score generated from the command line is different from the results of the web version( http://crispor.tefor.net/ ). Have you ever encountered this problem, what is the possible reason? Thanks

maximilianh commented 1 year ago

I’m not entirely surprised, small differences are likely… do you have the example sequence and the exact difference ?

Also which versions are you using ? At this moment, the GitHub version is Python 3, the website is still on Python 2 ( to change very soon )

On Sat, May 6, 2023 at 8:33 AM guomaoping @.***> wrote:

I ran into a problem that the deep-cpf1 score generated from the command line is different from the results of the web version( http://crispor.tefor.net/ ), why?

— Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/61, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TM5HHDTUSAQR4UPLX3XEXWEZANCNFSM6AAAAAAXX5BQIE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

guomaoping commented 1 year ago

Thank you for your reply! I use a 1000bp sequence as input, the command line is as follows: "python3 $path/crisporWebsite/crispor.py --pam=ATTN --effScores=cpf1 --tempDir=./tmpdir hg38 1000bp.fa ATTN_1000bp_cli_scoreGuides.tsv"

The input sequence is as follows: "ATTAATACTTTTAACAATTGTAGTATATAAAAAAGGGAGTAACCGAAAACGGTCGGGACCGAAAACGGTGTATATAAAAGATGTGAGAAACACACCACAATACTATGGCGCGCTTTGAGGATCCAACACGGCGACCCTACAAGCTACCTGATCTGTGCACGGAACTGAACACTTCACTGCAAGACATAGAAATAACCTGTGTATATTGCAAGACAGTATTGGAACTTACAGAGGTATTTGAATTTGCATTTAAAGATTTATTTGTGGTGTATAGAGACAGTATACCGCATGCTGCATGCCATAAATGTATAGATTTTTATTCTAGAATTAGAGAATTAAGACATTATTCAGACTCTGTGTATGGAGACACATTGGAAAAACTAACTAACACTGGGTTATACAATTTATTAATAAGGTGCCTGCGGTGCCAGAAACCGTTGAATCCAGCAGAAAAACTTAGACACCTTAATGAAAAACGACGATTTCACAACATAGCTGGGCACTATAGAGGCCAGTGCCATTCGTGCTGCAACCGAGCACGACAGGAACGACTCCAACGACGCAGAGAAACACAAGTATAATATTAAGTATGCATGGACCTAAGGCAACATTGCAAGACATTGTATTGCATTTAGAGCCCCAAAATGAAATTCCGGTTGACCTTCTATGTCACGAGCAATTAAGCGACTCAGAGGAAGAAAACGATGAAATAGATGGAGTTAATCATCAACATTTACCAGCCCGACGAGCCGAACCACAACGTCACACAATGTTGTGTATGTGTTGTAAGTGTGAAGCCAGAATTGAGCTAGTAGTAGAAAGCTCAGCAGACGACCTTCGAGCATTCCAGCAGCTGTTTCTGAACACCCTGTCCTTTGTGTGTCCGTGGTGTGCATCCCAGCAGTAAGCAACAATGGCTGATCCAGAAGGTACAGACGGGGAGGGCACGGGTTGTAACGGCTGGTTTTATGTACAAGCTATTGTAGACAAAAAAACAGGA"

The top 5 results of the command line are as follows: 1000bp_test 1forw ATTAATACTTTTAACAATTGTAGTATA -1 -1 30 NotEnoughFlankSeq GrafOK 1000bp_test 17forw ATTGTAGTATATAAAAAAGGGAGTAAC -1 -1 14 NotEnoughFlankSeq GrafOK 1000bp_test 98rev ATTGTGGTGTGTTTCTCACATCTTTTA -1 -1 15 62.107296 tt 1000bp_test 190rev ATTTCTATGTCTTGCAGTGAAGTGTTC -1 -1 7 38.360176 tt 1000bp_test 205forw ATTGCAAGACAGTATTGGAACTTACAG -1 -1 2 67.37283 GrafOK

The top 5 results of the web version are as follows:

guideId targetSeq mitSpecScore cfdSpecScore offtargetCount targetGenomeGeneLocus DeepCpf1-Score grafType

1forw   ATTAATACTTTTAACAATTGTAGTATA -1  -1  30      NotEnoughFlankSeq   GrafOK
17forw  ATTGTAGTATATAAAAAAGGGAGTAAC -1  -1  14      NotEnoughFlankSeq   GrafOK
98rev   ATTGTGGTGTGTTTCTCACATCTTTTA -1  -1  15      **44.11671**    tt
190rev  ATTTCTATGTCTTGCAGTGAAGTGTTC -1  -1  7       **25.683983**   tt
205forw ATTGCAAGACAGTATTGGAACTTACAG -1  -1  2       **51.02048**    GrafOK

environment.txt ATTN_1000bp_cli_scoreGuides.csv ATTN_1000bp_web_scoreGuides.csv

maximilianh commented 1 year ago

it's strange that the sequence does not match the hg38 genomes. Is this intentional?

Anyhow, I finally got the Python3 version installed and it gives the same scores as the Python2 version. See http://crispor.gi.ucsc.edu/crispor.py?batchId=ASazfpnUeH14sdYvIC02 (this is the version of the software that is also in the github master branch. It's not the crispor.tefor.net version.)

I wonder if that has to do with something with how you setup the machine learning packages... did you compare the versions?

On Sat, May 6, 2023 at 12:12 PM guomaoping @.***> wrote:

Thank you for your reply! I use a 1000bp sequence as input, the command line is as follows: "python3 $path/crisporWebsite/crispor.py --pam=ATTN --effScores=cpf1 --tempDir=./tmpdir hg38 1000bp.fa ATTN_1000bp_cli_scoreGuides.tsv"

The input sequence is as follows:

"ATTAATACTTTTAACAATTGTAGTATATAAAAAAGGGAGTAACCGAAAACGGTCGGGACCGAAAACGGTGTATATAAAAGATGTGAGAAACACACCACAATACTATGGCGCGCTTTGAGGATCCAACACGGCGACCCTACAAGCTACCTGATCTGTGCACGGAACTGAACACTTCACTGCAAGACATAGAAATAACCTGTGTATATTGCAAGACAGTATTGGAACTTACAGAGGTATTTGAATTTGCATTTAAAGATTTATTTGTGGTGTATAGAGACAGTATACCGCATGCTGCATGCCATAAATGTATAGATTTTTATTCTAGAATTAGAGAATTAAGACATTATTCAGACTCTGTGTATGGAGACACATTGGAAAAACTAACTAACACTGGGTTATACAATTTATTAATAAGGTGCCTGCGGTGCCAGAAACCGTTGAATCCAGCAGAAAAACTTAGACACCTTAATGAAAAACGACGATTTCACAACATAGCTGGGCACTATAGAGGCCAGTGCCATTCGTGCTGCAACCGAGCACGACAGGAACGACTCCAACGACGCAGAGAAACACAAGTATAATATTAAGTATGCATGGACCTAAGGCAACATTGCAAGACATTGTATTGCATTTAGAGCCCCAAAATGAAATTCCGGTTGACCTTCTATGTCACGAGCAATTAAGCGACTCAGAGGAAGAAAACGATGAAATAGATGGAGTTAATCATCAACATTTACCAGCCCGACGAGCCGAACCACAACGTCACACAATGTTGTGTATGTGTTGTAAGTGTGAAGCCAGAATTGAGCTAGTAGTAGAAAGCTCAGCAGACGACCTTCGAGCATTCCAGCAGCTGTTTCTGAACACCCTGTCCTTTGTGTGTCCGTGGTGTGCATCCCAGCAGTAAGCAACAATGGCTGATCCAGAAGGTACAGACGGGGAGGGCACGGGTTGTAACGGCTGGTTTTATGTACAAGCTATTGTAGACAAAAAAACAGGA"

The top 5 results of the command line are as follows: 1000bp_test 1forw ATTAATACTTTTAACAATTGTAGTATA -1 -1 30 NotEnoughFlankSeq GrafOK 1000bp_test 17forw ATTGTAGTATATAAAAAAGGGAGTAAC -1 -1 14 NotEnoughFlankSeq GrafOK 1000bp_test 98rev ATTGTGGTGTGTTTCTCACATCTTTTA -1 -1 15 62.107296 tt 1000bp_test 190rev ATTTCTATGTCTTGCAGTGAAGTGTTC -1 -1 7 38.360176 tt 1000bp_test 205forw ATTGCAAGACAGTATTGGAACTTACAG -1 -1 2 67.37283 GrafOK

The top 5 results of the web version are as follows:

guideId targetSeq mitSpecScore cfdSpecScore offtargetCount

targetGenomeGeneLocus DeepCpf1-Score grafType 1forw ATTAATACTTTTAACAATTGTAGTATA -1 -1 30 NotEnoughFlankSeq GrafOK 17forw ATTGTAGTATATAAAAAAGGGAGTAAC -1 -1 14 NotEnoughFlankSeq GrafOK 98rev ATTGTGGTGTGTTTCTCACATCTTTTA -1 -1 15 44.11671 tt 190rev ATTTCTATGTCTTGCAGTGAAGTGTTC -1 -1 7 25.683983 tt 205forw ATTGCAAGACAGTATTGGAACTTACAG -1 -1 2 51.02048 GrafOK

environment.txt https://github.com/maximilianh/crisporWebsite/files/11412056/environment.txt ATTN_1000bp_cli_scoreGuides.csv https://github.com/maximilianh/crisporWebsite/files/11412066/ATTN_1000bp_cli_scoreGuides.csv ATTN_1000bp_web_scoreGuides.csv https://github.com/maximilianh/crisporWebsite/files/11412067/ATTN_1000bp_web_scoreGuides.csv

— Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/61#issuecomment-1537108757, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TMNHRMARGYLPIRSFCLXEYPY7ANCNFSM6AAAAAAXX5BQIE . You are receiving this because you commented.Message ID: @.***>

guomaoping commented 1 year ago

Thank you very much for your reply. This sequence is a sequence intercepted from HPV18. I have compared the versions of the deep learning packages in my installation environment, and they are the same as those specified in the "requirements.txt" file. The package versions are: keras==2.11.0, scikit-learn==1.2.2, scipy==1.10.1, tensorflow==2.11.0

However, I noticed that line 54 of the "INSTALL.md" file reads "I am using keras/tensorflow 2.1.1. I hope that the exact version is not important," which is inconsistent with "tensorflow==2.11.0, keras==2.11.0" in the "requirements.txt" file. Which version should I choose?

In addition, I compared the DeepCpf1 scores of the crispor.tefor.net version and the crispor.gi.ucsc.edu version, and found that they are indeed different.

The result from the command-line version that I calculated is consistent with the crispor.gi.ucsc.edu version, but different from the crispor.tefor.net version. Here are the links to the results: Result of the crispor.tefor.net version: http://crispor.tefor.net/crispor.py?batchId=ASazfpnUeH14sdYvIC02

Results of the crispor.gi.ucsc.edu version (the one you provided): http://crispor.gi.ucsc.edu/crispor.py?batchId=ASazfpnUeH14sdYvIC02

Thank you for your help and guidance on this matter.

maximilianh commented 1 year ago

Hm. This is disturbing. I am traveling right now and won't be able to look into this. Do you know which of the two scores are "correct" ? Is there something one could compare to?

This problem comes up because I'm moving everything to a new server and Python3.

I just checked: the gi.ucsc.edu server is running keras='2.9.0' and '2.9.1', I think I downgraded the version, but am blanking now why...I updated the requirements.txt in Git. This will not solve your problem...

On the old server, crispor.tefor.net, I'm running keras 2.1.5 and tensorflow 1.7.0, but I have no idea if that's related in any way. I guess we have to find a few example scores or another website to compare to, to make sure that the scores are correct. I had no idea that they could change so easily, I had assumed the package to be solid and stable now.

On Mon, May 15, 2023 at 12:18 PM guomaoping @.***> wrote:

Thank you very much for your reply. This sequence is a sequence intercepted from HPV18. I have compared the versions of the deep learning packages in my installation environment, and they are the same as those specified in the "requirements.txt" file. The package versions are: keras==2.11.0, scikit-learn==1.2.2, scipy==1.10.1, tensorflow==2.11.0

However, I noticed that line 54 of the "INSTALL.md" file reads "I am using keras/tensorflow 2.1.1. I hope that the exact version is not important," which is inconsistent with "tensorflow==2.11.0, keras==2.11.0" in the "requirements.txt" file. Which version should I choose?

In addition, I compared the DeepCpf1 scores of the crispor.tefor.net version and the crispor.gi.ucsc.edu version, and found that they are indeed different.

The result from the command-line version that I calculated is consistent with the crispor.gi.ucsc.edu http://crispor.gi.ucsc.edu version, but different from the crispor.tefor.net http://crispor.tefor.net version. Here are the links to the results: Result of the crispor.tefor.net version: http://crispor.tefor.net/crispor.py?batchId=ASazfpnUeH14sdYvIC02

Results of the crispor.gi.ucsc.edu version (the one you provided): http://crispor.gi.ucsc.edu/crispor.py?batchId=ASazfpnUeH14sdYvIC02

Thank you for your help and guidance on this matter.

— Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/61#issuecomment-1547585982, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TLTKAMLO5YEB6GHNQLXGH7IRANCNFSM6AAAAAAXX5BQIE . You are receiving this because you commented.Message ID: @.***>

maximilianh commented 1 year ago

What you can do is, you can run the original script, crispor/bin/deepCpf1/DeepCpf1-orig.py. That's the algorithm without any modifications and it has a command line interface. This will make sure that I didn't screw anything up. It should give you the same scores as in Crispor.

If that works, and you can check one single score that it matches what the authors got, at http://data.snu.ac.kr/DeepCpf1

The weights are in crispor/bin/deepCpf1/weights/Seq_deepCpf1_weights.h5, but I didn't touch these, as far as I know.

On Mon, May 15, 2023 at 1:38 PM Maximilian Haeussler @.***> wrote:

Hm. This is disturbing. I am traveling right now and won't be able to look into this. Do you know which of the two scores are "correct" ? Is there something one could compare to?

This problem comes up because I'm moving everything to a new server and Python3.

I just checked: the gi.ucsc.edu server is running keras='2.9.0' and '2.9.1', I think I downgraded the version, but am blanking now why...I updated the requirements.txt in Git. This will not solve your problem...

On the old server, crispor.tefor.net, I'm running keras 2.1.5 and tensorflow 1.7.0, but I have no idea if that's related in any way. I guess we have to find a few example scores or another website to compare to, to make sure that the scores are correct. I had no idea that they could change so easily, I had assumed the package to be solid and stable now.

On Mon, May 15, 2023 at 12:18 PM guomaoping @.***> wrote:

Thank you very much for your reply. This sequence is a sequence intercepted from HPV18. I have compared the versions of the deep learning packages in my installation environment, and they are the same as those specified in the "requirements.txt" file. The package versions are: keras==2.11.0, scikit-learn==1.2.2, scipy==1.10.1, tensorflow==2.11.0

However, I noticed that line 54 of the "INSTALL.md" file reads "I am using keras/tensorflow 2.1.1. I hope that the exact version is not important," which is inconsistent with "tensorflow==2.11.0, keras==2.11.0" in the "requirements.txt" file. Which version should I choose?

In addition, I compared the DeepCpf1 scores of the crispor.tefor.net version and the crispor.gi.ucsc.edu version, and found that they are indeed different.

The result from the command-line version that I calculated is consistent with the crispor.gi.ucsc.edu http://crispor.gi.ucsc.edu version, but different from the crispor.tefor.net http://crispor.tefor.net version. Here are the links to the results: Result of the crispor.tefor.net version: http://crispor.tefor.net/crispor.py?batchId=ASazfpnUeH14sdYvIC02

Results of the crispor.gi.ucsc.edu version (the one you provided): http://crispor.gi.ucsc.edu/crispor.py?batchId=ASazfpnUeH14sdYvIC02

Thank you for your help and guidance on this matter.

— Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/61#issuecomment-1547585982, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TLTKAMLO5YEB6GHNQLXGH7IRANCNFSM6AAAAAAXX5BQIE . You are receiving this because you commented.Message ID: @.***>

maximilianh commented 1 year ago

Hi, any news from this? Were you able to confirm if the crispor.tefor.net scores are the correct ones?

I have another score, the saCas9 score, that has the same problem. I think I'll have to require a python2. The authors of that algorithm also use the Microsoft code and the Microsoft people are not available anymore to update the pickle files.

maximilianh commented 1 year ago

Two other scores have similar problems. I'll have to retain python2 as a requirement I think. Did you try the python2 version ?

On Sat, May 6, 2023 at 9:36 AM Maximilian Haeussler @.***> wrote:

I’m not entirely surprised, small differences are likely… do you have the example sequence and the exact difference ?

Also which versions are you using ? At this moment, the GitHub version is Python 3, the website is still on Python 2 ( to change very soon )

On Sat, May 6, 2023 at 8:33 AM guomaoping @.***> wrote:

I ran into a problem that the deep-cpf1 score generated from the command line is different from the results of the web version( http://crispor.tefor.net/ ), why?

— Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/61, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TM5HHDTUSAQR4UPLX3XEXWEZANCNFSM6AAAAAAXX5BQIE . You are receiving this because you are subscribed to this thread.Message ID: @.***>