soachishti / moss.py

Python client for Moss: A System for Detecting Software Similarity
MIT License
381 stars 75 forks source link

Returning empty URL #25

Closed linus-cheng-xw closed 5 years ago

linus-cheng-xw commented 5 years ago

Hello, I've had success with the program in the past until recently. I used to submit about 45 files at one go, but currently, it only works with only 2.

I did not change anything on my computer, and I've supplied the correct userid. I've checked with telnet that I can successfully connect to moss.stanford.edu at port 7690

Here is the error that returns:


Exception Traceback (most recent call last)

in 19 20 #Save report file ---> 21 m.saveWebPage(url, "report/7pm_report.html") 22 23 # Download whole report locally including code diff links ~\AppData\Local\Continuum\anaconda3\lib\site-packages\mosspy\moss.py in saveWebPage(self, url, path) 142 def saveWebPage(self, url, path): 143 if len(url) == 0: --> 144 raise Exception("Empty url supplied") 145 146 response = urlopen(url) Exception: Empty url supplied
soachishti commented 5 years ago

You have closed the issue, have you found the solution already?

linus-cheng-xw commented 5 years ago

Yes, MOSS was returning too many items. I fixed it by changing the parameters in self.options

soachishti commented 5 years ago

Awesome! Can you share which options worked for you, so that we have a record here.

linus-cheng-xw commented 5 years ago

Sure!

self.options = { "l": "c", "m": 4, "d": 0, "x": 0, "c": "", "n": 50 }

I think what made a difference is the m parameter. As quoted by the official documentation,

The -m option sets the maximum number of times a given passage may appear before it is ignored. A passage of code that appears in many programs is probably legitimate sharing and not the result of plagiarism. With -m N, any passage appearing in more than N programs is treated as if it appeared in a base file (i.e., it is never reported). Option -m can be used to control moss' sensitivity. With -m 2, moss reports only passages that appear in exactly two programs. If one expects many very similar solutions (e.g., the short first assignments typical of introductory programming courses) then using -m 3 or -m 4 is a good way to eliminate all but truly unusual matches between programs while still being able to detect 3-way or 4-way plagiarism. With -m 1000000 (or any very large number), moss reports all matches, no matter how often they appear. The -m setting is most useful for large assignments where one also a base file expected to hold all legitimately shared code. The default for -m is 10.

An in-depth explanation of the options are explained here, http://moss.stanford.edu/general/scripts/mossnet

soachishti commented 5 years ago

Excellent! Thanks a lot for your sharing the information.