okiriza / esg-paper

MIT License
0 stars 0 forks source link

Update ner.py #4

Closed okiriza closed 11 months ago

okiriza commented 11 months ago
reinhack commented 11 months ago

question:

  1. company_data nya harus format .txt kah? ini skrg ada dalam bentuk .csv
  2. output jadi (start_pos, end_pos, company_name/ticker mentioned in text, ticker) "company_name/ticker mentioned in text" maksudnya gimana ya?

kalo contoh inputnya seperti ini:

text = """
MEDCOW is not the same as MEDC. 
JAKARTA is not the same with ARTA. 
KARTOS is not the same with ARTO.
Telkom Indonesia (Persero) Tbk is also known as TLKM.
"""

current outputnya: [(63, 67, 'ARTA'), (98, 102, 'ARTO'), (27, 31, 'MEDC'), (104, 134, 'Telkom Indonesia (Persero) Tbk'), (152, 156, 'TLKM')]

what's the intended output?

okiriza commented 11 months ago
  1. bebas
  2. maksudnya yg elemen ketiga = sesuai yg muncul di teks. elemen keempat = sudah dimapping ke tickernya. contoh muncul "PT Telkom" --> elemen ketiga = "PT Telkom", elemen keempat = "TLKM"