higgood / seechat-folks

2 stars 1 forks source link

Low coverage of recent publications in the literature review #39

Open wammar opened 1 week ago

wammar commented 1 week ago

Context I heard this complaint from at least 4 users by now.

Describe the bug Often times, the user can't find what they're looking for in the literature review, and they THINK it's because we're not including recent publications.

[Required] Where did you encounter the bug [ ] While creating your account (first time user) [ ] While logging in to your account [ ] While applying changes to the settings (e.g., # of papers) [ ] While typing the query [ ] While Idea Factory was thinking or generating text [x] After Idea Factory was done generating text

[Required] Reality vs. Expectation Reality: Most papers are old. Expectation: Most papers are new (but only for modern topics).

Thank you for improving the Idea Factory!!

muhammadnasr commented 1 week ago

maybe related to use better recency #4

wammar commented 1 week ago

@Mox301 Please take a look at this idea: https://demo.seechat.ai/idea/deb252de-6101-4c2d-924f-e8a3bf7edc32/Integrating-AI%2C-IoT%2C-and-Multidisciplinary-Insights-for-Dynamic-Customer-Profiling

The input has absolutely nothing to do with IoT but somehow an IoT survey made it to the top K papers and was used to generate a VERY BAD idea.

Looking at the list of publications, all of them has ridiculously high # of citations. This suggests that we have a serious bug in our ranking algorithm which allows publications with very high number of citations to dominate more relevant papers.

It may also be worthwhile to understand why the IoT survey was included in the results in the first place. Perhaps we need to experiment with retrieving a smaller number of results when we call the search API? I'm not sure which API this was, but I'm assuming that we still have S2 as the primary for all users?

Mox301 commented 1 week ago

It may also be worthwhile to understand why the IoT survey was included in the results in the first place. Perhaps we need to experiment with retrieving a smaller number of results when we call the search API? I'm not sure which API this was, but I'm assuming that we still have S2 as the primary for all users?

@wammar I found that the primary API was OpenAlex, this is my bad, so I changed the primary API to S2 in all environments, so could you please try the same input again and share it with me the output please, this will give us insight about the different data sources.