scieloorg / articles_meta

Webservices para recuperar metadados de artigos SciELO armazenados no MongoDB
BSD 2-Clause "Simplified" License
7 stars 17 forks source link

Disponibiliza um endpoint para identificar IDs e idiomas a partir de rotas de PDFs #236

Closed robertatakenaka closed 2 years ago

robertatakenaka commented 2 years ago

O que esse PR faz?

Disponibiliza um endpoint para identificar IDs e idiomas a partir de rotas de PDFs

Onde a revisão poderia começar?

por commits

Como este poderia ser testado manualmente?

Acessar a rota /api/v1/article/pdfs/ com parâmetros:

O resultado é similar ao endpoint /api/v1/collection/identifiers/ Sendo que o conteúdo de objects é uma lista de:

{
"processing_date": "2022-03-01",
"collection": "col",
"code": "S0121-03192020000200013",
"pdfs": [
    {
"lang": "en",
"path": "pdf/v20n1s0/a01.pdf",
"doi": "10.18273/REVMED.V33N2-2020013"
},
{
"lang": "pt",
"path": "pdf/v20n1s0/pt_a01.pdf",
"doi": "10.18273/REVMED.V33N2-2020013.pt"
}
]
}

Algum cenário de contexto que queira dar?

Contribui para a contagem de acessos (sushi)

Screenshots

n/a

Quais são tickets relevantes?

n/a

Referências

n/a

rafaelpezzuto commented 2 years ago

@robertatakenaka

Na geração dos dicionários COUNTER eu itero sobre todos os artigos que o ArticleMeta entrega. Então, como também há esse nova rota eu precisaria iterar duas vezes sobre tudo, pois, na primeira iteração eu extraio os seguintes campos:

article.v40 # idioma padrão
code
code_title
collection
created_at
fulltexts
processing_date
publication_date
publication_year
title.v68 # acrônimo do periódico
updated_at

Seria possível você incluí-los nessa nova rota para que eu itere apenas uma vez?

No exemplo que você deu, ficaria:

{
    "processing_date": "2022-03-01",
    "publication_date": "2020-08",
    "publication_year": "2020",
    "created_at": "2021-10-27",
    "updated_at": "2022-03-14",
    "default_language": "es",
    "code_title": [
        "1794-5240",
        "0121-0319"
    ],
    "journal_acronym": "muis",
    "collection": "col",
    "code": "S0121-03192020000200013",
    "htmls": [
        {
        "lang": "es",
        "url": "http://www.scielo.org.co/scielo.php?script=sci_arttext&pid=S0121-03192020000200013&tlng=es",
        }
    ]
    "pdfs": [
        {
        "lang": "es",
        "path": "pdf/muis/v33n2/0121-0319-muis-3302-109.pdf", # isso é o que há no dicionário COUNTER atual
        "doi": "10.18273/REVMED.V33N2-2020013"
        },
        {
        "lang": "en",
        "path": "pdf/v20n1s0/a01.pdf",
        "doi": "10.18273/REVMED.V33N2-2020013"
        },
       {
        "lang": "pt",
        "path": "pdf/v20n1s0/pt_a01.pdf",
        "doi": "10.18273/REVMED.V33N2-2020013.pt"
        }
    ]
}

Ainda, veja como está o campo fulltexts para esse artigo no ArticleMeta:

image

Observe que o fulltexts.pdf.es contém um caminho bem diferente do que há no exemplo mencionado por você: http://www.scielo.org.co/pdf/muis/v33n2/0121-0319-muis-33-02-109.pdf. Será que está correto o caminho proposto para o PDF? Vou instalar a aplicação aqui para ver o que é entregue.

rafaelpezzuto commented 2 years ago

@robertatakenaka Testando em um ambiente real com dumps das coleções do articlemeta, obtive o seguinte erro ao acessar as rotas:

  1. /api/v1/article/pdfs
  2. /api/v1/article/pdfs/?from_date=2020-01-01&limit=10&issn=0100-879X
[2022-05-05 00:16:50 -0300] [3927] [ERROR] Error handling request /api/v1/article/pdfs/?from_date=2020-01-01&limit=10&issn=0100-879X
Traceback (most recent call last):
  File "/home/rafaeljpd/.virtualenvs/scielo-articles-meta/lib/python3.5/site-packages/pyramid/tweens.py", line 12, in _error_handler
    response = request.invoke_exception_view(exc_info)
  File "/home/rafaeljpd/.virtualenvs/scielo-articles-meta/lib/python3.5/site-packages/pyramid/view.py", line 756, in invoke_exception_view
    raise HTTPNotFound
pyramid.httpexceptions.HTTPNotFound: The resource could not be found.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/rafaeljpd/.virtualenvs/scielo-articles-meta/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 136, in handle
    self.handle_request(listener, req, client, addr)
  File "/home/rafaeljpd/.virtualenvs/scielo-articles-meta/lib/python3.5/site-packages/gunicorn/workers/sync.py", line 179, in handle_request
    respiter = self.wsgi(environ, resp.start_response)
  File "/home/rafaeljpd/.virtualenvs/scielo-articles-meta/lib/python3.5/site-packages/pyramid/router.py", line 270, in __call__
    response = self.execution_policy(environ, self)
  File "/home/rafaeljpd/.virtualenvs/scielo-articles-meta/lib/python3.5/site-packages/pyramid/router.py", line 278, in default_execution_policy
    return request.invoke_exception_view(reraise=True)
  File "/home/rafaeljpd/.virtualenvs/scielo-articles-meta/lib/python3.5/site-packages/pyramid/view.py", line 755, in invoke_exception_view
    reraise_(*exc_info)
  File "/home/rafaeljpd/.virtualenvs/scielo-articles-meta/lib/python3.5/site-packages/pyramid/compat.py", line 148, in reraise
    raise value
  File "/home/rafaeljpd/.virtualenvs/scielo-articles-meta/lib/python3.5/site-packages/pyramid/router.py", line 276, in default_execution_policy
    return router.invoke_request(request)
  File "/home/rafaeljpd/.virtualenvs/scielo-articles-meta/lib/python3.5/site-packages/pyramid/router.py", line 249, in invoke_request
    response = handle_request(request)
  File "/home/rafaeljpd/.virtualenvs/scielo-articles-meta/lib/python3.5/site-packages/pyramid/tweens.py", line 41, in excview_tween
    response = _error_handler(request, exc)
  File "/home/rafaeljpd/.virtualenvs/scielo-articles-meta/lib/python3.5/site-packages/pyramid/tweens.py", line 16, in _error_handler
    reraise(*exc_info)
  File "/home/rafaeljpd/.virtualenvs/scielo-articles-meta/lib/python3.5/site-packages/pyramid/compat.py", line 148, in reraise
    raise value
  File "/home/rafaeljpd/.virtualenvs/scielo-articles-meta/lib/python3.5/site-packages/pyramid/tweens.py", line 39, in excview_tween
    response = handler(request)
  File "/home/rafaeljpd/.virtualenvs/scielo-articles-meta/lib/python3.5/site-packages/pyramid/router.py", line 156, in handle_request
    view_name
  File "/home/rafaeljpd/.virtualenvs/scielo-articles-meta/lib/python3.5/site-packages/pyramid/view.py", line 642, in _call_view
    response = view_callable(context, request)
  File "/home/rafaeljpd/.virtualenvs/scielo-articles-meta/lib/python3.5/site-packages/pyramid/viewderivers.py", line 390, in attr_view
    return view(context, request)
  File "/home/rafaeljpd/.virtualenvs/scielo-articles-meta/lib/python3.5/site-packages/pyramid/viewderivers.py", line 368, in predicate_wrapper
    return view(context, request)
  File "/home/rafaeljpd/.virtualenvs/scielo-articles-meta/lib/python3.5/site-packages/pyramid/viewderivers.py", line 439, in rendered_view
    result = view(context, request)
  File "/home/rafaeljpd/.virtualenvs/scielo-articles-meta/lib/python3.5/site-packages/pyramid/viewderivers.py", line 148, in _requestonly_view
    response = view(request)
  File "/home/rafaeljpd/Repos/articles_meta/articlemeta/articlemeta.py", line 443, in pdfs_paths
    until_date=until_date)
  File "/home/rafaeljpd/Repos/articles_meta/articlemeta/controller.py", line 1291, in pdfs_paths
    offset=offset, extra_filter=extra_filter)
  File "/home/rafaeljpd/Repos/articles_meta/articlemeta/controller.py", line 806, in pdfs_paths
    rec = _pdfs_paths(item)
  File "/home/rafaeljpd/Repos/articles_meta/articlemeta/controller.py", line 35, in _pdfs_paths
    if article.publisher_ahead_id:
  File "/home/rafaeljpd/.virtualenvs/scielo-articles-meta/src/xylose/xylose/scielodocument.py", line 2209, in publisher_ahead_id
    return self.data['article'].get('v881', [{'_': None}])[0]['_']
KeyError: 'article'
robertatakenaka commented 2 years ago

continua como #238