[test on V5.3.1] unexpected behaviors when upgrade to V5.3.1

superkaiy commented 8 months ago

Hi, Guys. I do some test on the engine V5.3.1.

scan file	KB	comment
1.py	file1	no modified, just rename
2.py	file2	partial content of file2
3.py	file1, file2	mixed with file1 and file2

about V5.3.0, the result is expected as above but about V5.3.1, the result is unexpected. the scan result about 2.c and 3.c is that they matched nothing output_5.3.0.txt output_5.3.1.txt test_case.zip

mscasso-scanoss commented 7 months ago

Hi @superkaiy, thank you for reporting this issue. Attached, you will find the response from our servers. Unfortunately, we are using different knowledge bases, so I'm unable to reproduce your issue. However, I will be happy to assist you in identifying the root of this problem and confirming whether there is a bug or not.

Please run the 'scanoss' command for each file, adding "-q" at the end, to obtain the debug information, and share it with me. Please perform this for both v5.3.0 and v5.3.1. I will be eagerly awaiting your response to proceed.

Best regards, Mariano test_output.txt

superkaiy commented 7 months ago

Hi @mscasso-scanoss , my knowledge bases contain two components as following:

repo: https://github.com/dpkp/kafka-python.git; branch: master; revision: 5bb126bf20bbb5baeb4e9afc48008dbe411631bc
repo: https://github.com/lencx/ChatGPT.git; branch: main; revision: de5c8f0f8770c0e836d808bede4ac50427611ff5

debug information: debug_infor_v5.3.0.txt debug_infor_v5.3.1.txt

mscasso-scanoss commented 7 months ago

Hi! sorry for the delay, please test the last version and close the issue if you think the problem is solved. Best regards, Mariano

superkaiy commented 7 months ago

Hello @mscasso-scanoss , I test the latest version, and the match percent is ~25% even if the file has been modified very slightly.

insert/delete/modify just one line at the beginning of the file, even the line is blank
insert/delete/modify just one line in the middle of the file, even the line is blank
insert/delete/modify just one line at the end of the file, even the line is blank

In the above scenario, the match accuracy from the latest version is obviously not as expected. but the match accuracy from the version V5.3.0 is as expected. You can reference to the modification as following for the difference between the two versions: https://github.com/scanoss/engine/blob/4a801c981cc15ac319465aafdf1d7b990b4e3d58/src/snippets.c#L673

mscasso-scanoss commented 7 months ago

Hello, @superkaiy, thank you so much for your feedback. I will keep it in mind for the next release. However, it's important to note that the engine, based on the winnowing algorithm, isn't designed to produce precise line ranges. The matching percentage is also an approximation and may not always accurately reflect reality. The snippet matching concept aims to assist users in identifying possible matches and snippets, but confirming the exact range requires manual validation or additional analysis, such as HPSM.

I understand that you obtained more accurate ranges with the previous engine version in the cases you tested. However, the latest version is actually yielding better results in our extensive test dataset. Please feel free to open a pull request with your change over the current state of main branch, and I will test it.

Additionally, we are currently hosting a workshop in Madrid to discuss match accuracy. If you'd like to participate in a virtual meeting, please send me an email at mariano.scasso@scanoss.com. Your presence would be highly valuable to us.

superkaiy commented 7 months ago

Hello @mscasso-scanoss , Thanks for your patience. Just as you said, the snippet matching concept is just a means of assistance, the difference in result may be due to different test datasets. Maybe HPSM can meet the requirements which need exact range, for more other data will be generated during scanning runtime

scanoss / engine

[test on V5.3.1] unexpected behaviors when upgrade to V5.3.1 #57