marvinody / mercarius

Mercari US API Wrapper
1 stars 1 forks source link

Duplicate Checker function PR #1

Open Gresliebear opened 1 year ago

Gresliebear commented 1 year ago

I ran the code & made duplicate checker function their will be some duplicates you mentions heres a short one I will make pull request could you check the PR? I am worried about passing the worng type. I am really really super rusty with python so

the ouput of the function

total merch count:  71
All IDs in query:
m61132818055
m39120640854
m92469760999
m37079895063
m88170829921
m69386825764
m13815872112
m13819965479
m82116262430
m70078119859
m89363826960
m23772677937
m85710301598
m67572434544
m94220323203
m46142669582
m20877134328
m24661369557
m67407169825
m98631682158
m56229363647
m35351619203
m96455037000
m24455263937
m43512602140
m88285315954
m34038354939
m16060417985
m80817266474
m81468388995
m61132818055
m39120640854
m38402351777
m52225487406
m61602614181
m84582295955
m81121739753
m68777053566
m22426628773
m36698199996
m92469760999
m82541653619
m37079895063
m82116262430
m56964957307
m88170829921
m16471956877
m77974106655
m66960562467
m59210806216
m23772677937
m88285315954
m85710301598
m77233606835
m13950725570
m27206658590
m51832327780
m52352660386
m79921916683
m62008245132
m32688002435
m16244417819
m98595159233
m52225487406
m84399687712
m40289347081
m87625705729
m51552301584
m79302221510
m42953152718
m87058404693
Number of Dups:10
Number of Dups:0

the function itself


from mercarius import search, SearchItemStatus

import json

merch = list(search('plush touhou', status=SearchItemStatus.ON_SALE))

# print(merch)
print("total merch count: ", len(merch))

print('All IDs in query:')
print('\n'.join([x["id"] for x in merch]))

def DupCheckers(resp):
    data = resp
    unique_ids = set()
    filtered_data = []
    id_counts = {}

    # Iterate over the original data
    for item in data:
        # Check if the "id" key is already in the set of unique IDs
        if item["id"] not in unique_ids:
            # If it's not, add the ID to the set and add the item to the filtered data
            unique_ids.add(item["id"])
            filtered_data.append(item)

        if item["id"] not in id_counts:
        # If it's not, add the ID to the dictionary with a count of 1
            id_counts[item["id"]] = 1
        else:
            # If it's already in the dictionary, increment the count for the ID
            id_counts[item["id"]] += 1

    total_duplicates = sum(count - 1 for count in id_counts.values())
    # Dump the filtered data as a JSON string
    print(f"Number of Dups:{total_duplicates}")
    filtered_json_str = json.dumps(filtered_data)
    return filtered_data

check = DupCheckers(merch)
DupCheckers(check)
# print('All keys in a single item:')
# print(json.dumps(merch[0], indent=2))
# open a file in write mode
with open("mercariusdata.txt", "w") as f:
    # dump the dictionary to the file
    json.dump(merch, f)
Gresliebear commented 1 year ago

@marvinody I am struggling with git pushing to repo is there permission issue or could this be my git config

git push --set-upstream origin duplicate-checker remote: Permission to marvinody/mercarius.git denied to Gresliebear. fatal: unable to access 'https://github.com/marvinody/mercarius.git/': The requested URL returned error: 403

marvinody commented 1 year ago

You need to fork the repo in github, then set that as your upstream. From there, you would push to your own branch, and come back to this github project and make a PR if you want.

If you make the PR, it'll be easier to comment per line on some quick changes to make

Gresliebear commented 1 year ago

You need to fork the repo in github, then set that as your upstream. From there, you would push to your own branch, and come back to this github project and make a PR if you want.

If you make the PR, it'll be easier to comment per line on some quick changes to make

Ya I was hoping it would let me create branch in line Oh well.

hopefully the duplication checker works with the search class you made, I am going to pull a ton of squishmallow data for my project, but I am worried I am going to get IP banned so I was think about proxies implementation to rotate IPs until I get all the data. set up server I guess I was googling something I will look into if I get banned

Thank you again!