nmfs-fish-tools / fishdictionary

A dictionary scheme for fisheries
https://connect.fisheries.noaa.gov/fishdictionary
GNU Affero General Public License v3.0
3 stars 1 forks source link

Searching references when looking for terms in the pdf files #14

Open kellijohnson-NOAA opened 2 years ago

kellijohnson-NOAA commented 2 years ago

@Bai-Li-NOAA I noticed that unfished|virgin|equilibrium comes up just once in noaa_17252_DS1.pdf so I went to the pdf to see where it occurred and it was in the reference section. Do you think that we should try to eliminate searching the entire document or just not worry about it? The trouble of eliminating the reference section is that it is often in the middle of the document. I vote for just not worrying about it and mentioning it in the Discussion section or something along those lines. But, I wanted to get other's thoughts.

chantelwetzel-noaa commented 2 years ago

If omitting the reference section is too difficult, and it could be especially since we will want to scan sections that occur after the references (tables and figures), perhaps we don't worry about it. We could make strategic decisions on how to present the key word search information. If we opt to only use a word cloud, then terms that are only found once or a few will likely not be seen. Alternatively, if we opt to have a table of key terms we could impose a lower bound on items to include that could also deal with this.

I have also been wondering if we should increase the number of assessments summarized. I initially only grabbed a 2-3 from each region thinking we were going to use them to guide us to make decisions on what terms to include in our glossary, but if we want to present this as more of a robust synthesis we may want to add more assessments. What do others think?

Bai-Li-NOAA commented 2 years ago

Good catch, Kelli! Thanks to both of you for providing the solutions already. I will give it a try and see if I can remove the reference section before counting keywords. If not, I agree that we could create a word cloud/table with a lower bound on items.

Chantel, I like the idea of increasing the number of assessments summarized, what would be the maximum number per region? I could also help with modifying the output table so the count can be grouped by regions.

On Wed, Mar 2, 2022 at 10:42 AM Chantel Wetzel @.***> wrote:

If omitting the reference section is too difficult, and it could be especially since we will want to scan sections that occur after the references (tables and figures), perhaps we don't worry about it. We could make strategic decisions on how to present the key word search information. If we opt to only use a word cloud, then terms that are only found once or a few will likely not be seen. Alternatively, if we opt to have a table of key terms we could impose a lower bound on items to include that could also deal with this.

I have also been wondering if we should increase the number of assessments summarized. I initially only grabbed a 2-3 from each region thinking we were going to use them to guide us to make decisions on what terms to include in our glossary, but if we want to present this as more of a robust synthesis we may want to add more assessments. What do others think?

— Reply to this email directly, view it on GitHub https://github.com/nmfs-fish-tools/data_dictionary/issues/14#issuecomment-1057072509, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOJI36QSBS5RCDEZMEORWB3U56D4FANCNFSM5PVQXIMQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were assigned.Message ID: @.***>

chantelwetzel-noaa commented 2 years ago

I went to Stock Smart and pulled additional assessment documents for all Science Centers. The number of assessments for some Science Centers were limited by those available (PIFSC and SWFSC) but I tried to grab a large selection across a range of species. I have added the following files onto the google drive folder ("Assessment Docs"):

AFSC: 16 NWFSC: 18 NEFSC: 16 PIFSC: 8 SEFSC: 22 SWFSC: 4

Bai-Li-NOAA commented 2 years ago

Thanks Chantel. Will update the text mining outputs using updated assessment docs.

On Thu, Mar 3, 2022 at 10:26 AM Chantel Wetzel @.***> wrote:

I went to Stock Smart https://www.st.nmfs.noaa.gov/stocksmart?stockname=Scup%20-%20Atlantic%20Coast&stockid=10286 and pulled additional assessment documents for all Science Centers. The number of assessments for some Science Centers were limited by those available (PIFSC and SWFSC) but I tried to grab a large selection across a range of species. I have added the following files onto the google drive folder ("Assessment Docs"):

AFSC: 16 NWFSC: 18 NEFSC: 16 PIFSC: 8 SEFSC: 22 SWFSC: 4

— Reply to this email directly, view it on GitHub https://github.com/nmfs-fish-tools/data_dictionary/issues/14#issuecomment-1058155067, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOJI36WUEDFPP62MLZQ6PSTU6DK23ANCNFSM5PVQXIMQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were assigned.Message ID: @.***>