sciunto-org / python-bibtexparser

Bibtex parser for Python 3
https://bibtexparser.readthedocs.io
MIT License
468 stars 130 forks source link

Url field renamed as link #171

Open omangin opened 7 years ago

omangin commented 7 years ago

The current behavior for homogenize_field is to rename url to link. It seems however that url is more standard and understood at least by biblatex and natbib (according to this and this).

Shouldn't the behavior be changed to the opposite (renaming link to url)?

SamuelCabralCruz commented 7 years ago

It would be really easy to perform this correction. I think the only thing we got to do is to modify the alt_dict which contains the field renaming information.

SamuelCabralCruz commented 7 years ago

If for any reason the authors of the package do not want to modify this aspect, a simple tweak to correct the situation would be the following:

new_alt_dict = {
    'keywords': 'keyword',
    'keyw': 'keyword',
    'subjects': 'subject',
    'urls': 'url',
    'link': 'url',
    'links': 'url',
    'editors': 'editor',
    'authors': 'author'}
parser = bibtexparser.bparser.BibTexParser()
parser.alt_dict = new_alt_dict
bib_database = bibtexparser.loads(bibtex_str=<bibtex_string>, parser=parser)

Another way but less precise and less convenient would be to set your parser homogenize_field attribute to False.

Phyks commented 7 years ago

@omangin :+1: url seems to be more popular.

@SamuelCabralCruz do you want to submit a PR for this and update the tests?

Not sure if we should keep the two behaviors or not though :/

Thanks!

SamuelCabralCruz commented 7 years ago

@Phyks When you say keep the two behaviors you mean that we could simply set the alt_dict as an input parameter to __init__() method of the BibTexParser class. However, we could use the alt_dict I just proposed by default. This would be the simpliest way of making the correction.

I will wait for your respond before submitting the pull request.

Phyks commented 7 years ago

I was thinking about doing something like this

parser = bibtexparser.bparser.BibTexParser(convert_link_to_url=False)

to restore the current behavior. But I am definitely not sure this is a good thing to do. I think we should just add more docs on this, so that users can just edit alt_dict manually.

SamuelCabralCruz commented 7 years ago

@Phyks That would definetly works also, but if the same "problem" arise for any other field we will face the same problem. Modifying the alt_dict manually is really close to my proposition, but if we want to respect the principles of object oriented programming, the attributes of a class should never be modified outside the class methods. We could then add a gettter and setter which would make some assertions whether or not the provided alt_dict is valid. Those assertion could valid if the keys of the dictionary are valid BibTeX's field and that the value associated to each key is of str type. Overall, I stick to my guns thinking the simplest way to adapt the package for this issue would be to allow the passage of a hand made dictionary to the constructor of the BibTexParser class. The downside of my proposition is the fact that we will use a mutable type object as an argument which may provoque breach of encapsulation.

Phyks commented 7 years ago

Being able to do

> parser = bibtexparser.bparser.BibTexParser()
> parser.get_replacement_dict()
{
    'keywords': 'keyword',
    'keyw': 'keyword',
    'subjects': 'subject',
    'urls': 'url',
    'link': 'url',
    'links': 'url',
    'editors': 'editor',
    'authors': 'author'
}
> parser.set_replacement('link', 'foobar')
> parser.get_replacement_dict()
{
    'keywords': 'keyword',
    'keyw': 'keyword',
    'subjects': 'subject',
    'urls': 'url',
    'link': 'foobar',
    'links': 'url',
    'editors': 'editor',
    'authors': 'author'
}

would really be awesome! :)

If you can implement it with tests and doc in a PR, that would be truly awesome!

omangin commented 7 years ago

It would be really easy to perform this correction. I think the only thing we got to do is to modify the alt_dict which contains the field renaming information.

I agree, I just wanted to get more opinions before proceeding.

Regarding the getter and setter, I'm under the impression that the pythonic way of doing that would just be a parser.alt_dict = {...}.

SamuelCabralCruz commented 7 years ago

@Phyks I can see to it! @omangin I agree with you that the pythonic way would be as you say, but according to me there are two really big downsides to python:

  1. Documentation of the modules, packages, libraries are often really deficient
  2. No concept of encapsulation and privatization (only import masking using _ prefix) This is why I take the habit of creating getters and setters for my own package to tell the user that I do not want him (or that I don't think he is supposed to) play with the attribute of the class interfaced.

I would be happy to create the pull request, but I respect your ownership of the package. What would you prefer?

omangin commented 7 years ago

@SamuelCabralCruz, I agree with you that python does not exactly promote what would elsewhere be considered as best practice, but I think the most important thing here is about documenting the feature so that it's used in a correct way.

For me, as long as what the dictionary is used for is well explained, setting and getting the dictionary as an attribute is the most straightforward to implement (because there is nothing to implement) and to use. The dictionary structure exactly matches the replacement semantics and the getter and setter are no more than the dictionary's.

On the other hand, if you wish to implement the set_replacement, get_replacement_dict, as far as I'm concerned you can go for it and it would be a good way of improving how the whole feature is controlled. You'll then also need a method for emptying the dictionary and tests for all of them. In this case it would also probably make more sense to call the method *_field_replacement* and rename the attribute that stores the dictionary. Finally, we should also think about the homogenize_fields option, that might be implemented slightly differently: emptying the dictionary during __init__ when the option is False, so that later setting a replacement implies that it will be used. If we need such an option, it should probably then be directly in the parse method. Anyway, it seems that these modifications, would need a PR on there own…

I'll make the PR for switching the link, url replacement. Feel free to make another one for improving the replacement behavior.

omangin commented 7 years ago

Done for the PR. After looking at it again it feels like homogenize_fields should probably be implemented as a customization function, that would take the dictionary as an optional argument. And while we are there, customization should probably be put in plural. It seems that one currently needs to manually chain the functions to perform several customizations. I believe setting a list of customization would make more sense.

If you agree, we should move all this to its own new issue…

SamuelCabralCruz commented 7 years ago

@omangin I do agree with you. The first time I looked at the customization I had to take a 2 minutes of reflexion to figure out how to use it. There is no documentation in addition! Making this part simplier would be a great idea. However, the reason why we are differencing customization and field homogenization is due to relative importance. According to me, one of the feature of a good parser is its ability to combine the input from multiple sources into an homogenized one. So I think that it would be a better idea of letting the field homogenization into the default behavior of the BibTexParser.

However, I do support your idea of creating a customizations method which would make the customization process easier and any modification to default alt_dict could be done via a specific customization as you said.