Open roniemartinez opened 2 years ago
Hey, It's a good idea to use mlscraper as a backend. But first of all, we need data (inputs and outputs).
@daniel7an
Yes, I can see potential on this one.
@roniemartinez
Autoscraper is another one that would be great to have in Dude. It learns the scraping rules and returns similar elements. It just needs a few examples and isn't complicated as mlscraper.
Input:
wanted_list = ["What are metaclasses in Python?"]
Output:
[ 'How do I merge two dictionaries in a single expression in Python (taking union of dictionaries)?', 'How to call an external command?', 'What are metaclasses in Python?', 'Does Python have a ternary conditional operator?', 'How do you remove duplicates from a list whilst preserving order?', 'Convert bytes to a string', 'How to get line count of a large file cheaply in Python?', "Does Python have a string 'contains' substring method?", 'Why is “1000000000000000 in range(1000000000000001)” so fast in Python 3?' ]
Any ideas to add this one to Dude? Should I open a new issue for this?
@daniel7an
The thing is, I've been reading the source code of Autoscraper and it is not actually using Machine Learning or AI. It is just using difflib.SequenceMatcher
. What the project claims that it runs on ML or AI are incorrect.
Please correct me if I am wrong. I cannot categorize it as such, but for sure it learns by saving rules.
@daniel7an
Any ideas to add this one to Dude? Should I open a new issue for this?
Though it seems Autoscraper does not fall into this category, I believe it is a very powerful tool for web scraping and I'd love to include it. Please open a separate ticket.
@daniel7an
Any ideas to add this one to Dude? Should I open a new issue for this?
Though it seems Autoscraper does not fall into this category, I believe it is a very powerful tool for web scraping and I'd love to include it. Please open a separate ticket.
Done ✅
Possible format:
Potential backends: