petermr / docanalysis

Semantic analysis of text documents including sentence and paragraph splitting
Apache License 2.0
12 stars 3 forks source link

Docanalysis on ipcc documents #38

Open suroseth opened 6 months ago

suroseth commented 6 months ago

Hi, I wanted to extract information from IPCC using docanalysis. However I was not able to get any results, the result files don't contain any information. I even tried running the code that has been mentioned in ipcc_analysis.ipynb. But I faced the same problem. I have attached the colab sheet (ipcc_analysis_run: https://colab.research.google.com/drive/1c-Kin1hybI-tyHKaaIlvnRRxecgRd1Li) here which I have run and also the sheet that I have been following (ipcc_analysis: https://colab.research.google.com/drive/1ZDutqm7psCiECOQFQeWtGFxp_6Y3hTX1) provided during the semantic hackathon. Please let me know if I'm missing something.

petermr commented 6 months ago

Greetings! The Google sheets are not visible by default. It's probably a good idea to make them public.

Docanalysis runs on the scholarly literature and does not (yet) run on IPCC content

On Thu, Mar 7, 2024 at 9:42 AM suroseth @.***> wrote:

Hi, I wanted to extract information from IPCC using docanalysis. However I was not able to get any results, the result files don't contain any information. I even tried running the code that has been mentioned in ipcc_analysis.ipynb. But I faced the same problem. I have attached the colab sheet (ipcc_analysis_run: https://colab.research.google.com/drive/1c-Kin1hybI-tyHKaaIlvnRRxecgRd1Li) here which I have run and also the sheet that I have been following (ipcc_analysis: https://colab.research.google.com/drive/1ZDutqm7psCiECOQFQeWtGFxp_6Y3hTX1) provided during the semantic hackathon. Please let me know if I'm missing something.

— Reply to this email directly, view it on GitHub https://github.com/petermr/docanalysis/issues/38, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS5G2HUPTFCKHWUPIEDYXAY7XAVCNFSM6AAAAABEKUR2PWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE3TGNBSGI3TKMQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

suroseth commented 6 months ago

sorry for the inconvenience, https://drive.google.com/file/d/1ZTXjMi1wYkkyAhvzDtKK1RI-mdbMvuHg/view?usp=sharing here is the link to the code that I was trying.

petermr commented 6 months ago

I got from you:


Notebook validation failed

An invalid notebook may not function properly. The validation error was:

Notebook Validation failed: {'pip_warning': {'packages':
['_distutils_hack', 'pkg_resources', 'setuptools']}, 'id':
'fd2cd6efeb914b0e945e2af19b1d64c5'} is not valid under any of the
given schemas:
{
 "pip_warning": {
  "packages": [
   "_distutils_hack",
   "pkg_resources",
   "setuptools"
  ]
 },
 "id": "fd2cd6efeb914b0e945e2af19b1d64c5"
}

and for cell [15]

Counter()

---------------------------------------------------------------------------ValueError
                               Traceback (most recent call
last)<ipython-input-15-4033914c5813> in <cell line: 5>()      3
climate_terms_hits = get_hit_counts(PATH)      4
pprint(climate_terms_hits)----> 5
generate_wordcloud(climate_terms_hits, FILE_NAME)
<ipython-input-14-555f99bd7172> in generate_wordcloud(hits_dictionary,
file_name)      2 def generate_wordcloud(hits_dictionary, file_name):
    3     wc = WordCloud(background_color='white', width = 700,
height=300, margin=1)----> 4     wc.fit_words(hits_dictionary)      5
   wc.to_file(file_name)
/usr/local/lib/python3.10/dist-packages/wordcloud/wordcloud.py in
fit_words(self, frequencies)    387         self    388         """-->
389         return self.generate_from_frequencies(frequencies)    390
   391     def generate_from_frequencies(self, frequencies,
max_font_size=None):  # noqa: C901
/usr/local/lib/python3.10/dist-packages/wordcloud/wordcloud.py in
generate_from_frequencies(self, frequencies, max_font_size)    408
    frequencies = sorted(frequencies.items(), key=itemgetter(1),
reverse=True)    409         if len(frequencies) <= 0:--> 410
   raise ValueError("We need at least 1 word to plot a word cloud, "
 411                              "got %d." % len(frequencies))    412
        frequencies = frequencies[:self.max_words]
ValueError: We need at least 1 word to plot a word cloud, got 0.

>>>PMR>>
It seems you haven't got any words to plot!

Maybe add some print statements to see where this might have happened
>>>

On Thu, Mar 7, 2024 at 10:43 AM suroseth ***@***.***> wrote:

> sorry for the inconvenience,
> https://drive.google.com/file/d/1ZTXjMi1wYkkyAhvzDtKK1RI-mdbMvuHg/view?usp=sharing
> here is the link to the code that I was trying.
>
> —
> Reply to this email directly, view it on GitHub
> <https://github.com/petermr/docanalysis/issues/38#issuecomment-1983237863>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAFTCS3CB5NY2KOT2XZZN3DYXBAGHAVCNFSM6AAAAABEKUR2PWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBTGIZTOOBWGM>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>

-- 
Peter Murray-Rust
Founder ContentMine.org
and
Reader Emeritus in Molecular Informatics
Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK