medialab / hyphe

Websites crawler with built-in exploration and control web interface
http://hyphe.medialab.sciences-po.fr/demo/
GNU Affero General Public License v3.0
328 stars 59 forks source link

Allow entities with only a TLD in their LRU to be divided (add_webentity_creationrule) #445

Closed Klocohdonou closed 2 years ago

Klocohdonou commented 2 years ago

Hi Benjamin!

I have an entity whose LRU prefix matches the entire .org TLD (see issue #444 for the reason as to why). Its LRU prefix is s:http|h:org|. I tried to divide the entity using a WebEntity creation rule from the web UI, but Hyphe rejects it, by saying the LRU is invalid.

Request: {"method":"store.add_webentity_creationrule","params":["s:http|h:org|","prefix 1","my_corpus_name"]} Response form the Hyphe API: [{"message": "ERROR: s:http|h:org| is not a proper LRU.", "code": "fail"}]

Do you think this could be fixed?

Thanks a lot again for your help, Kevin

boogheta commented 2 years ago

actually here the behavior is the good one and it's normal it would refuse it. Fixing #444 should avoid this to happen, so I don't think we should remove the rule here. I'm afraid in your corpus case, you will have to work around it as you started doing by declaring other entities sorry.

Klocohdonou commented 2 years ago

Thanks for answering! Yeah, declaring subentities one by one with a script works well, I'll keep doing just that.