Open jorisvandenbossche opened 1 week ago
Hi Joris! If I fix this, I could send you a PR. Would you be able to merge my PR then or give suggestions on my PR so it can be merged? I have a school assignment deadline of working on an open source good first issue where the owner will at the end merge my PR. I was wondering if you can assign me this and help me? I am a 4th year Computer Engineering major.
Also, would you be able to tell me what files I should look at for this so I can start? Do I fork the main branch?
Hi @tasfia8 kindly check the contributing docs: https://pandas.pydata.org/docs/development/contributing.html. For guidance regarding github issue assignment, proper format of PRs, etc...
I recommend you to work on an issue with a label good first issue
since those issues mainly work on simple fixes that are good for first time contributors
I have already started working on this, would you be able to assign me this? I think I can do it and I have read the contributing files thank you.
@tasfia8 - issue assignment can be found on the contributing docs
take
@jorisvandenbossche I think I have figured it out, just wanted to show both of @KevsterAmp and you before I make a PR. I will issue a PR soon and let you know. I get this as output now, is this what you are expecting? I have additional test cases as well and it passes all existing test cases as well. Output:
The issue was that dict_keys was passed directly to the StringDtype's _from_sequence method, which could not handle non-array-like inputs like dict_keys. The fix involved updating the handling of dict_keys during the construction of an Index or Series.
@tasfia8 apologies for the slow response. The output you show is indeed the expected behaviour. I think the easiest will be to make a PR so we can see the code and more easily give feedback (and feel free to mark the PR as "draft" if you are unsure if it is ready, but then we can already take a look)
Done @jorisvandenbossche. Please see https://github.com/pandas-dev/pandas/pull/60383.
When not specifying a dtype (inferring the type), construction of
Index
orSeries
from dict keys goes fine:But if you explicitly specify the dtype, then it fails:
The reason is that at that point we pass the data directly to the dtype's array
_from_sequence
instead of first pre-processing the data into a numpy array, and_from_sequence
callingensure_string_array
directly doesn't seem to be able to handle dict keys (although we do callnp.asarray(..)
insideensure_string_array
, so not entirely sure what is going wrong)