websites - Githubissues

Thanks! The disambiguated software names are really very useful! I selected out software items with the "category" field containing substring = ['website', 'platform', 'company', 'service'] to maximize the results. Then I manually checked whether the filtered software are indeed web platforms by web search and reading its context. I end up with a list of 39 software names: web_platform_disambiguated.txt (Just got to know that github does not allow attached files in csv format ;() The mention_type of these 39 software_name labels has already been changed to "web platform" in our dataset. Some web platforms, like neighborgoods, justpark, as @kermitt2 mentioned, do not have detailed information in Wikidata. I just modified their mention_type in the dataset. In the future, these need to be manually coded. (our coding guideline is already updated correspondingly) Other things that Wikidata miss: It has been identified that, some software name labels in our dataset, such as DINO, DTS, Nutritionist, Scion, TM4, VISTA, are actual research software names. But in Wikidata they are homonyms referring to entities other than software. (~8% of this not representative sample :) CSV outputs already updated in the repo!

howisonlab / softcite-dataset

websites #611